Understand the embedding created by the dot product model ( Collaborative filtering)

ArchieIndian · August 22, 2018, 5:38pm

I am trying to look at the dot product based collaborative filtering, without any bias or additional layers.
I am getting a model which has user embeddings and movie embeddings.

How can I see the embeddings of a particular user or a particular movie?

Say I want to see the latent vector associated with userid=15 (Non converted) and say movieid = 20?

devforfu · August 23, 2018, 7:02am

Try to do something like this:

embedding = m.u(userId)

This should return you a single embeddings vector. To get more insights about nn.Embedding you can check this embeddings tutorial.

ArchieIndian · August 23, 2018, 11:44am

I found a way to get the embeddings:-
For the dot product with bias model:-
To get the rating by user 1 to movie 32:- I do the following:-

(sigmoid(np.dot(to_np(m.u(V(1))), to_np(m.m(V(32))).T) + to_np(m.ub(V(1))) + to_np(m.mb(V(32))))) * (max_rating-min_rating) + min_rating

However, I am not able to use an apply on this so that i can predict ratings of all the users for all movies.

import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def predict_values(row):
    users,movies = row['userId'],row['movieId']
    um = (m.u(V(row['userId']))* m.m(V(row['movieId']))).sum(1)
    res = um + m.ub(V(users)).squeeze() + m.mb(V(movies)).squeeze()
    res = F.sigmoid(res) * (max_rating-min_rating) + min_rating
    return res.view(res.size()[0],1)
    
    
    
ratings2['pred']  = ratings2.apply(predict_values,axis=1)

Gives me an error:-
RuntimeError: (“Expected tensor for argument #1 ‘indices’ to have scalar type Long; but got CUDAFloatTensor instead (while checking arguments for embedding)”, ‘occurred at index 0’)

devforfu · August 23, 2018, 1:06pm

It says that you have a wrong tensor type: you are passing a torch.FloatTensor (which was moved to the GPU with .cuda() call) but the model expects a scalar of long type. I guess you need check types of your arguments, and make sure that you’re not passing GPU tensor when a CPU one expected, and vice versa.

ArchieIndian · August 23, 2018, 4:13pm

Any idea on how to solve this.

I can see that the errors occurs because of type mismatch.
For instance,

m.u(V(ratings2.loc[0]['userId']))
throws the same error.

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CUDAFloatTensor instead (while checking arguments for embedding)

seems like ratings2.loc[0][‘userId’] doesn’t have the correct datatype?

devforfu · August 24, 2018, 5:05am

Yes, you’re passing a plain Python scalar of type long, but I guess you should pass something like torch.tensor(ratings['userId'], dtype=torch.float).cuda() instead.

ArchieIndian · August 24, 2018, 7:25am

Thanks. I was able to get this to work individually but not inside the apply loop. Anyway, I should be able to figure out the rest

mark598 · May 18, 2020, 4:08am

@ArchieIndian, did you end up solving this/can you link to the notebook? I’ve tried to re-create your approach here and am getting different predictions than learn.predict(row from dataframe) for some reason.
Would really appreciate it!