Collaborative filtering - get predictions for the new user

Hello there!

Could anyone support me in solving the following scenario:

  1. I’ve pre-trained and exported a model using collab_learner (for example, users and their ratings on movies).
  2. Using load_learner, I import the model and would like to get predictions for the completely new user.
  3. To solve the cold start problem, I propose the new user to rate some of the most rated movies (let’s say 5-10 movies).
  4. Based on these ratings, I’d like to give some new recommendations (using get_preds).

Since my exported model doesn’t “know” anything about this new user (it’s not in the training set), can I somehow “add” this information to the existing learner or should I re-train the model?

Using the following code, what should I pass as a test_dl? Unseen movies without ratings?

test_dl = learn.dls.test_dl(df_test, with_labels=True)

preds = learn.get_preds(dl=test_dl, with_decoded=True)

Thanks for any help!

After you have more data about more new users’s ratings, you can do either option:

  1. train the existing model in the new users’s ratings. This is preferable when your new users are quite different from your old users based behavior. This is similar to Transfer Learning.
  2. retrain the model using a combination of old and new data.
1 Like

I had a same problem and I was trying to understand what to do exactly: in the pretrained model I exported the new user did not exist, so how can I use that model and train with new data (with the new user) on a model where the embedding of that user is not present? Lets say for example that the pretrained model had 100 users so user embeddings is a 100x40 matrix (40=latent factors). I export the model and now I want to train with a new dataloaders that only contains i.e. 10 new users.
How can i tell to fastai “hey these are 10 new users, so I need 10 more vectors on my embeddings matrix for that”. And when I create the new dataloader with new examples I also need a way to pass the mapping of “old” 100 users to embeddings indexes otherwise it will overlap the 10 users on the indexes used by 100 old users.
Or maybe I didnt understand it well?

2 Likes

I found some inspiration of solving the cold start problem here: The Cold Start Problem - #2 by cedric

In my case, it might worth to provide predictions with another model (tabular model with items similarity instead of users).

Other than that, it seems like it’s only possible to re-train the entire model…

I’d like to know the answer to that as well…

In fact these are 2 different problems.
Problem 1, the “cold start”:
as you said an easy solution could be to ask to the new user some ratings of popular movies.
then you have some new data about that user.
If you have metadata about users (age, location, interests…) , because you already have trained a collaborative filtering and you have a lot of users encoded as embedding, you could also have a second model where the x are the metadata and the target is the embedding from your collaborative filtering. So you can predict the initial embedding of a new user using his metadata, and with this embedding you can make some recomendations.

Problem 2:
When you have new data there is a way to keep the old model and just update it or you have to train it from scratch with all the data (old data + new data)?
If already have a lot of near perfect embeddings for some users, I don’t like to retrain them from scratch if I have new data that doesn’t concern them.

1 Like

I’m also interested in knowing how can this be solved. First how to add and reindex the new users and items, and also how to delete existing ones. This will be useful for

  • Learn predictions for new users that start interacting with your system.
  • Learn predictions for new items that you add to your system.
  • Delete old users that delete their accounts, be GDPR compliant, and also not having unnecesary embeddings that will not get updated (reducing the model size to save storage).
  • Delete old items that you probably don’t want to present to de users (discontinued items, very old items, unpopular ones), and also not having unnecesary embeddings that will not get updated