After you trained the model in Google Colaboratory (and in chapter 7 in the fastbook)
How do you predict the movie’s rating for movies that the users have not seen? For example, user-id 242 has watched/rated 20 movies, so how do I predict the rating for user-242 on the other movies?
In other words, I want to see all the ratings for 1,682 movies for user-242. cc: @jeremy@sgugger
Does anyone have a solution or even a suggestion on how to do the above?
The forward pass on the model (dot product between the user and movie latent factors) would give us the predicted rating. I quickly patched up a get_ratings function for use or a suggestion towards inference.
You can pass in any number of movies for a single user as shown below or vice versa.
It would also work for equal lengths of movies and users.
def get_ratings(learn, users=tensor([]), items=tensor([])):
with torch.no_grad():
if not isinstance(items, torch.Tensor): items = tensor(items).view(-1)
if not isinstance(users, torch.Tensor): users = tensor(users).view(-1)
if len(items) == 0:
items = torch.arange(learn.model.i_weight.num_embeddings)
if len(users) == 0:
users = torch.arange(learn.model.u_weight.num_embeddings)
try:
dot = learn.u_weight(users)* learn.i_weight(items)
res = dot.sum(1) + learn.u_bias(users).squeeze() + learn.i_bias(items).squeeze()
return torch.sigmoid(res) * (learn.y_range[1]-learn.y_range[0]) + learn.y_range[0]
except:
print('The user/item index may be invalid')
get_ratings(learn, users=242, items=[10, 20, 3])
There may be some broadcasting tricks that I’m missing here as this does not work for multi-user multi-items when they have different lengths. For example,
get_ratings(learn, users=[242, 10], items=[10, 20, 3]) #this would not work
Thanks for the insight. I think I follow it, but I am not quite sure about the parameters. users=[242, 10], where “242” is the user-ID, what is the “10” represent? is “10” another user-id?
What are the three parameters for items=[10,20,3] represent? is items == list of movie-id?
And ran into error of device CPU and Cuda/GPU (on Google colab)
So I added these:
items = items.to(device = ‘cuda’)
users = users.to(device = ‘cuda’)
What I mean when the output is indexed by users is that output[0] is the rating that first user in the list would give for each of the items (movies) given.
For example, in our case
output[0] would be user-id 242 's ratings for the 3 items (movies) [10, 20, 3]
Why we say items instead of movies?
In general the recommendation can be on anything. For example, movies, songs, games etc. Hence we use items as a more general term.