Get predictions for test data via collab

Top 10 similar movies to a given movieId

Actually I don`t know if there is a straightforward solution in fastai. But you can always find euclidean distance between movie embeddings. Less this distance is, more similar two movies are.

Sure will checkout and keep you posted if i find anything. Thanks and really appreciate your responses @IRailean.

Hi there, following up on this.

Are there any updates on how to predict on a test set using fastai version 2 with collab_learners?

Thanks in advance!

Hi @dinodelao,

from my pov, something like this should work:

#enrich data set with our own preferences; picked movies.head() listed movies for convenience
#no idea about those movies :slight_smile:

my_prefs = {‘user’: [9999, 9999,9999, 9999, 9999]
, ‘movie’: [1,2,3,4,5]
, ‘rating’:[1,1,2,2,2]
, ‘timestamp’: [881250949,881251049, 881251100, 881252049, 881253049]
, ‘title’: [‘Toy Story (1995)’, ‘GoldenEye (1995)’, ‘Four Rooms (1995)’,‘Get Shorty (1995)’,‘Copycat (1995)’]}

my_df = pd.DataFrame(data=my_prefs)

ratings = pd.concat([ratings,my_df],axis=0)

After input data set is updated, train model as described in https://colab.research.google.com/github/fastai/fastbook/blob/master/08_collab.ipynb

Now lets answer the query what would I rate for a specific movie:

query = {‘user’: [9999, 9999]
, ‘movie’: [50,258]
, ‘title’: [‘Star Wars (1977)’, ‘Contact (1997)’]}

query_df = pd.DataFrame(data=query)

query_dl = dls.test_dl(query_df)

#get predictions for the test dl
learn.get_preds(dl=query_dl)

Hope it helps!

Hello IRailen,

Thanks a lot for your samples - I am a beginner trying to build a recommendation engine following your samples for my use case that involves another item instead of movies . I am trying to get top 10 recommendations for the already existing user My use case do not introduce cold start (except on rare occurance) and my dataset have at least one rating for all the users.

I tried using learn.predict function passing the given userid as test data like below

Code snippet…

learn.save(‘recommend-dot-1’)for i in range(len(plans)):
rows.append( dict({‘MEDICAL_PLAN’ : plans[i], ‘USERID’ : ‘252749’}))
test_data = pd.DataFrame(rows)

data_collab= CollabDataLoaders.from_df(ratings, test=test_data, seed=42, valid_pct=0.1, user_name=‘USERID’, item_name=‘PlanCode’, rating_name=‘RATING’, bs=64)
test_learn= collab_learner(data_collab, n_factors=50, y_range=(0, 10), wd=1e-1)
test_learn_loaded = test_learn.load(‘recommend-dot-1’)

preds, y = test_learn_loaded.predict(DatasetType.Test)<----------

I am stuck at the above line in the code as I am not able to find out what is the parameter DatasetType.Test I searched all over and I am not able to figure that out - Can you help please?

Thanks a bunch

hai ,i facing some problem ,for predicting new 4 movies for each user, In my dataset i am having 4200-movie_id and 42000 users .so for each 42000 users i have to get top 4 movie recommedations ,but model.predict will take a dataframe ,so that will be 42000*4200 is there any other way to get prediction for theses situation.

Hey everyone; I took some of the learnings in this post (as well as from other sources) and made a notebook for reference (mostly for myself, really :grimacing:). Mainly, I wrote the prediction helper functions as an add-on to what is being discussed here. I hope it helps someone!