# How to calculate cosine similarity of words in embedding matrix?

Here’s my ultimate goal:

Create a search feature for a web app that shows the information the most relevant to what the user types in the search bar. In other words, I have a bunch of tables and if the user types something like “plants”, I want all the tables that have words similar to “plant” to show up. This might be used as a way to reduce the need for labelling each table (or at least reduce the workload).

Now, I’m going to do transfer learning on my dataset. From what I understand, I can then use the new embedding matrix to find out which words are the most similar. How would I do this with the fastai library?

Thanks!

Check out the collab chapter of fastbook.
Specifically the `Embedding Distance` paragraph shows what you are looking for.

1 Like

Thanks for the quick reply! Let me see if I understand:

Create list of word vectors (?):

``````movie_factors = learn.model.i_weight.weight
``````

Find index of movie x (in this case, SotL):

``````idx = dls.classes['title'].o2i['Silence of the Lambs, The (1991)']
``````

Calculate cosine similarity between the movie at idx and every other movie in dataset:

``````distances = nn.CosineSimilarity(dim=1)(movie_factors, movie_factors[idx][None])
``````

Sort by highest similarity and then print the most similar movie:

``````idx = distances.argsort(descending=True)
dls.classes['title'][idx]
``````

This seems like exactly what I want to do with text. However, I’m still confused as to what is the equivalent in fastai.text. In other words, there is no i_weight.weight in fastai.text.learner so how would I go about getting the word vectors?

You have to look inside `learner.model`, like so:

1 Like

Fantastic! Thank you. I will try this once I get to this point in the project. 1 Like