Collab predictions seem to be broken

joedockrill · July 30, 2020, 8:22am

I want to make sure I’m not doing something wrong here.

I noticed some very consistent looking predictions coming out of my model for different users so I started digging:

get_preds() is always returning different results (i can run it for the same users/items 50 times in a row and get 50 different results)
i found something about NLP models needing ordered=True for get_preds but i tried that and it spat at the param so i assume it thinks they’re being returned ordered already.
predict() was returning the same results for every user. i can run it for every item in the model for the first 1000 users and get back the same predictions for all of them. occasionally i get 1 different set of preds in a “batch” (still calling predict() per item) but i figure that’s a rounding error somewhere further down.

for get_preds() at this point i’ve been creating my test df and doing

CollabDataBunch.from_df(df, test=df) 
learn.get_preds(DatasetType.Test)

then i tried creating my db with

CollabDataBunch.from_df(train_df, test=df)

but that made get_preds blow up with a key error trying to get user weights for users which were only in the ~~train~~ val set, so i retrained it with val_pct=0.

get_preds() is still behaving the same way but predict() now behaves, and returns the same results for the same user consistently, but different preds for each user, and the preds look pretty reasonable.

this was happening on my own model using jester as a dataset but i can reproduce the same behaviour on lesson4 with ml-100k.