For collab filtering the fastai neural net lib will be better than fastai collab filt lib

Video 5 of DL1 at 10m25s is where Jeremy begins to create a recommender based on just 3 columns: userid and movieid and rating. He’s going to ignore the date/time column.

This is unfortunate because as Jeremy himself argued very successfully in the lectures on the Kaggle Rossman predictions, there is a lot of predictive information in the date/time column, and in neural embeddings columns, which we can use to our great benefit.

In modern machine learning, the more predictive variables the better, as Jeremy well argued.

I say we should prefer to use much the same approach in this movie recommender, as we did in Rossman. Using just 3 columns of predictive variables (userid, productid, rating) is definitely not good enough any more in this day and age. It seems therefore that the fastai collab filtering library is wrong-headed and not really worth your time, because it does not have the flexibility needed to add an aribitrary number of additional embeddings, and timestamp-based feature generation.

Specifically in the movie recommender model being developed, we will immediately notice that there are many of the same kinds of variables present as in Rossman:

  • neural network embeddings can profitably be created for the categorical columns userid and movieid. We saw embeddings were very helpful in Rossman.

  • timestamp column actually contains date/time data. Thus the year, month number, day of week, hour of day, etc can profitably be created and used as features. The fastai collab filtering lib provides no way to incorporate this set of independent variables, unfortunately.

  • Additional neural network embeddings can be profitably created on movie director, producer, actors, as well as many other categoricals. Kaggle has such a dataset, linking the movie to its crew such as movie director, currently published for free. The fastai collab filtering lib provides no way to incorporate these independent variables, unfortunately.

The single big problem with immediately using the Rossman-style NN design for movie recommenders seems to be the sparse format of the input data. We’d like a way to handle sparse data files as simply as Jeremy’s CollabFiltering library. Then we’re home free to use a great NN model to add lots of predictive variables to our dataset for movie recommenders, including time series and many categorical variables, as well as ratings.

In modern machine learning, the more predictive variables the better, as Jeremy well argued.

1 Like

it’s surprisingly quite hard to beat the collab-filtering result; i still can’t understand why. you should try to “rossman” it and see if you could best it.

2 Likes