Get predictions for test data via collab

jaganlal · January 28, 2020, 3:26pm

Top 10 similar movies to a given movieId

IRailean · January 28, 2020, 4:57pm

Actually I don`t know if there is a straightforward solution in fastai. But you can always find euclidean distance between movie embeddings. Less this distance is, more similar two movies are.

jaganlal · January 28, 2020, 6:27pm

Sure will checkout and keep you posted if i find anything. Thanks and really appreciate your responses @IRailean.

dinodelao · October 13, 2020, 3:32pm

Hi there, following up on this.

Are there any updates on how to predict on a test set using fastai version 2 with collab_learners?

Thanks in advance!

obelix · January 24, 2021, 10:06am

Hi @dinodelao,

from my pov, something like this should work:

#enrich data set with our own preferences; picked movies.head() listed movies for convenience
#no idea about those movies

my_prefs = {‘user’: [9999, 9999,9999, 9999, 9999]
, ‘movie’: [1,2,3,4,5]
, ‘rating’:[1,1,2,2,2]
, ‘timestamp’: [881250949,881251049, 881251100, 881252049, 881253049]
, ‘title’: [‘Toy Story (1995)’, ‘GoldenEye (1995)’, ‘Four Rooms (1995)’,‘Get Shorty (1995)’,‘Copycat (1995)’]}

my_df = pd.DataFrame(data=my_prefs)

ratings = pd.concat([ratings,my_df],axis=0)

After input data set is updated, train model as described in https://colab.research.google.com/github/fastai/fastbook/blob/master/08_collab.ipynb

Now lets answer the query what would I rate for a specific movie:

query = {‘user’: [9999, 9999]
, ‘movie’: [50,258]
, ‘title’: [‘Star Wars (1977)’, ‘Contact (1997)’]}

query_df = pd.DataFrame(data=query)

query_dl = dls.test_dl(query_df)

#get predictions for the test dl
learn.get_preds(dl=query_dl)

Hope it helps!

rradhakr · April 14, 2021, 2:55am

Hello IRailen,

Thanks a lot for your samples - I am a beginner trying to build a recommendation engine following your samples for my use case that involves another item instead of movies . I am trying to get top 10 recommendations for the already existing user My use case do not introduce cold start (except on rare occurance) and my dataset have at least one rating for all the users.

I tried using learn.predict function passing the given userid as test data like below

Code snippet…

learn.save(‘recommend-dot-1’)for i in range(len(plans)):
rows.append( dict({‘MEDICAL_PLAN’ : plans[i], ‘USERID’ : ‘252749’}))
test_data = pd.DataFrame(rows)

data_collab= CollabDataLoaders.from_df(ratings, test=test_data, seed=42, valid_pct=0.1, user_name=‘USERID’, item_name=‘PlanCode’, rating_name=‘RATING’, bs=64)
test_learn= collab_learner(data_collab, n_factors=50, y_range=(0, 10), wd=1e-1)
test_learn_loaded = test_learn.load(‘recommend-dot-1’)

preds, y = test_learn_loaded.predict(DatasetType.Test)<----------

I am stuck at the above line in the code as I am not able to find out what is the parameter DatasetType.Test I searched all over and I am not able to figure that out - Can you help please?

Thanks a bunch

joju · September 29, 2022, 6:25am

hai ,i facing some problem ,for predicting new 4 movies for each user, In my dataset i am having 4200-movie_id and 42000 users .so for each 42000 users i have to get top 4 movie recommedations ,but model.predict will take a dataframe ,so that will be 42000*4200 is there any other way to get prediction for theses situation.

dokkosean · January 12, 2023, 10:55pm

Hey everyone; I took some of the learnings in this post (as well as from other sources) and made a notebook for reference (mostly for myself, really ). Mainly, I wrote the prediction helper functions as an add-on to what is being discussed here. I hope it helps someone!