@Andreas_Daiminger @seb0 @jeremy
There are 3 main types of recommendation systems
1. Content-based filtering
This is where you would employ search techniques and unsupervised learning to come up with a representation of each item you want to recommend.
For example, if you have the title, description, genres for each movie in MovieLens you use TF-IDF (or the sum of word embeddings) on the title and description and something like a simple set intersection on the genres for each movie to discover the Batman Begins is close to The Dark Knight.
This allows you to say “You watched your first movie, here are some similar ones”
This is the simplest approach
2. Collaborative Filtering
This is everything @jeremy teaches so I won’t go over it really.
However, it should be noted that Explicit collaborative filtering has generally fallen out of favor and more recent recommendation systems rely on Implicit data.
- Explicit means a rating, the user actually decided to give a star rating to the movie, or clicked the like button on a YouTube video for instance
- Implicit represents the users inherent behavior. For example, the user watched the movie but did not rate it.
You generally have significantly more implicit data and it turns out the people are actually quite bad at rating stuff as “5 stars” means something different to every person.
In practice, that means you’d have a User x Movie matrix where the values would be 1 if the user has watched the movie and a 0 otherwise. A 0 does not mean the user disliked the movie, it just means they haven’t interacted with it. You need a new loss function to handle this kind of data so we generally use something like Bayesian Personalized Ranking https://medium.com/@andresespinosapc/learning-to-rank-bpr-5fe5561d48e0
(or WARP)
3 Hybrid Recommendation Engines
This IS NOT the combination of 2 separate models (one for Content-based filtering and one for collaborative filtering. That is an ensemble network)
Hybrid recommendation engines run the traditional collaborative filtering process, trying to learn User and Item embeddings. However, they incorporate user-specific and item-specific metadata during the training process.
The user embeddings and item embeddings are represented as a sum of the embeddings of all their features. So if you had the genres: action, thriller to describe the movie John Wick then the item embedding for John Wick would be the sum of the embeddings for the genres action, thriller, and the movie id instead of just the movie id like we do in traditional collaborative filtering.
In practice, this is similar to how the TabularLearner works. It learns an embedding for each of the categorical features. Each row in the table would be a different movie and each column would be a genre. The genre embeddings are learned through SGD but instead of trying to predict a classification value, we sum the genre embeddings for each row E(action) + E(thriller) + E(movie_id) = E(movie) = John Wick.
Hope this all makes sense. Feel free to ask questions.
@jeremy are you interested in adding any of this into the fastai library?
I’ve really only scratched the surface here. There is a whole class of recommendation systems that treats user interactions as a sequence and uses RNNs to predict the next user action that’s showing a lot of promise right now.