Lesson 4: Neural network in collaborative filtering gives same recommendations for everyone

I’m working on collaborative filtering notebook from Lesson 4.

Neural network model gives same predictions for everybody

I collected my and my wife’s Netflix ratings and tested it on the last neural net (nn) model. We have very different kind of ratings, I’m into gritty thrillers and ‘foreign’ drama and artsy fartsy movies, she’s more into fantasy, 80s action & pop culture movies.

I first extended the model input vector to handle new user ids, then trained it from the scratch with training data as in the notebook, and finally trained with my ratings as a first batch and then my wife’s ratings as a second batch.

However, when I predicted our ratings for all movies from the training set and sorted it, it gave exactly the same list for both of us, only predicted rating values differed. Then I investigated the original notebook, and it seems that the neural net there also predicts very similar top 200 list for training set users. The difference is just a few items.

The manual bias model works

Then I compared the predictions from the manually constructed bias model (model). It gives varying results for training set users, usually 40-50% of top 200 results differ. If I trained it for my and my wife’s recommendations, it still seems to give similar results for both of us, so maybe I have some bug in my retraining code, but the difference for users in the training data is still pretty stark.

What is going on? Is the NN somehow just optimizing bias for movies and not really personalizing the recommendations? But how it then gets so good validation loss?

Could it simply be that the Top-200 are the good (read “popular”) movies that everyone likes? You know, I’m pretty sure we both like Terminator 2, Forrest Gump, Inception, etc etc… So these movies get rated close to five stars by everyone and thus are among the top recommendations for everyone.

An interesting question is are there 20, 200, or 2000 of such super movies that are recommended to everyone?

It seems plausible for the neural net to predict that both you and the wife would like The Matrix (or some other of those super hit movies), regardless of your different ratings.

IIRC this is a known problem in recommendation systems. How do you get past the obvious highly rated items, to surface some intriguing long tail finds?