Use fast.ai for checking sentence similarity

Wombat · July 26, 2019, 12:22am

I have a little project I’ve always wanted to create which I suppose is essentially build a type of “Google Assistant” or “Siri” but tailored more towards me. One of the biggest problems with that is the NLP side of things where I check for sentence similarity between what the user says/types and which Q/A it’s most similar to in my database.

The way that I was planning to initially use is simply to tokenize the word and get the euclidean distance to see which it’s most similar to.

After starting with this course and falling in love with the fast.ai library, I was wondering if there’s a way to make this possible using this library.

Thanks!

bennnun · July 26, 2019, 12:41am

@Wombat, I am not sure if this will be enough but you could instead use a “standard” similarity tool offered in another python library. That would look like this:

from difflib import SequenceMatcher
 
def get_similarity(a, b):
    return SequenceMatcher(None, a, b).ratio()

print(get_similarity("my dog likes to play", "my animal likes to play"))
>>> 0.7906976744186046

print(get_similarity("There is archaeological evidence of human occupation of the Rome area from approximately 14,000 years ago.", "紀元前7世紀頃には都市国家としての整備が進んだ。"))
>>> 0.0

print(get_similarity("a b c d e f g", "e f g h i j k"))
>>> 0.38461538461538464