Here are the questions:
- What problem does collaborative filtering solve?
It solves the problem of predicting the interests of users based on the interests of other users and recommending items based on these interests.
- How does it solve it?
The key idea of collaborative filtering is
latent factors
. The idea is that the model can tell what kind of items you may like (ex: you like sci-fi movies/books) and these kinds of factors are learned (via basic gradient descent) based on what items other users like.
- Why might a collaborative filtering predictive model fail to be a very useful recommendation system?
If there are not many recommendations to learn from, or enough data about the user to provide useful recommendations, then such collaborative filtering systems may not be useful.
- What does a crosstab representation of collaborative filtering data look like?
In the crosstab representation, the users and items are the rows and columns (or vice versa) of a large matrix with the values filled out based on the user’s rating of the item.
- Write the code to create a crosstab representation of the MovieLens data (you might need to do some web searching!)
To do by the reader
- What is a latent factor? Why is it “latent”?
As described above, a latent factor are factors that are important for the prediction of the recommendations, but are not explicitly given to the model and instead learned (hence “latent”).
- What is a dot product? Calculate a dot product manually using pure python with lists.
A dot product is when you multiply the corresponding elements of two vectors and add them up. If we represent the vectors as lists of the same size, here is how we can perform a dot product:
a = [1, 2, 3, 4] b = [5, 6, 7, 8] dot_product = sum(i[0]*i[1] for i in zip(a,b))
- What does
pandas.DataFrame.merge
do?
It allows you to merge DataFrames into one DataFrame.
- What is an embedding matrix?
It is what you multiply an embedding with, and in the case of this collaborative filtering problem, is learned through training.
- What is the relationship between an embedding and a matrix of one-hot encoded vectors?
An embedding is a matrix of one-hot encoded vectors that is computationally more efficient.
- Why do we need
Embedding
if we could use one-hot encoded vectors for the same thing?
Embedding
is computationally more efficient. The multiplication with one-hot encoded vectors is equivalent to indexing into the embedding matrix, and theEmbedding
layer does this. However, the gradient is calculated such that it is equivalent to the multiplication with the one-hot encoded vectors.
- What does an embedding contain before we start training (assuming we’re not using a prertained model)?
The embedding is randomly initialized.
- Create a class (without peeking, if possible!) and use it.
To do by the reader. Example in the chapter:
class Example: def __init__(self, a): self.a = a def say(self,x): return f'Hello {self.a}, {x}.'
- What does
x[:,0]
return?
The user ids
- Rewrite the
DotProduct
class (without peeking, if possible!) and train a model with it
Code provided in chapter:
class DotProduct(Module): def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)): self.user_factors = Embedding(n_users, n_factors) self.movie_factors = Embedding(n_movies, n_factors) self.y_range = y_range def forward(self, x): users = self.user_factors(x[:,0]) movies = self.movie_factors(x[:,1]) return sigmoid_range((users * movies).sum(dim=1), *self.y_range)
- What is a good loss function to use for MovieLens? Why?
We can use Mean Squared Error (MSE), which is a perfectly reasonable loss as we have numerical targets for the ratings and it is one possible way of representing the accuracy of the model.
- What would happen if we used
CrossEntropy
loss with MovieLens? How would we need to change the model?
We would need to ensure the model outputs 5 predictions. For example, with a neural network model, we need to change the last linear layer to output 5, not 1, predictions. Then this is passed into the Cross Entropy loss.
- What is the use of bias in a dot product model?
A bias will compensate for the fact that some movies are just amazing or pretty bad. It will also compensate for users who often have more positive or negative recommendations in general.
- What is another name for weight decay?
L2 regularization
- Write the equation for weight decay (without peeking!)
loss_with_wd = loss + wd * (parameters**2).sum()
- Write the equation for the gradient of weight decay. Why does it help reduce weights?
We add to the gradients
2*wd*parameters
. This helps create more shallow, less bumpy/sharp surfaces that generalize better and prevents overfitting.
- Why does reducing weights lead to better generalization?
This will result is more shallow, less sharp surfaces. If sharp surfaces are allowed, it can very easly overfit, and now this is prevented.
- What does
argsort
do in PyTorch?
This just gets the indices in the order that the original PyTorch Tensor is sorted.
- Does sorting the movie biases give the same result as averaging overall movie ratings by movie? Why / why not?
No it means much more than that. It takes into account the genres or actors or other factors. For example, movies with low bias means even if you like these types of movies you may not like this movie (and vice versa for movies with high bias).
- How do you print the names and details of the layers in a model?
Just by typing
learn.model
- What is the “bootstrapping problem” in collaborative filtering?
That the model / system cannot make any recommendations or draw any inferences for users or items about which it has not yet gathered sufficient information. It’s also called the cold start problem.
- How could you deal with the bootstrapping problem for new users? For new movies?
You could solve this by coming up with an average embedding for a user or movie. Or select a particular user/movie to represent the average user/movie. Additionally, you could come up with some questions that could help initialize the embedding vectors for new users and movies.
- How can feedback loops impact collaborative filtering systems?
The recommendations may suffer from representation bias where a small number of people influence the system heavily. E.g.: Highly enthusiastic anime fans who rate movies much more frequently than others may cause the system to recommend anime more often than expected (incl. to non-anime fans).
- When using a neural network in collaborative filtering, why can we have different number of factors for movie and user?
In this case, we are not taking the dot product but instead concatenating the embedding matrices, so the number of factors can be different.
- Why is there a
nn.Sequential
in theCollabNN
model?
This allows us to couple multiple nn.Module layers together to be used. In this case, the two linear layers are coupled together and the embeddings can be directly passed into the linear layers.
- What kind of model should be use if we want to add metadata about users and items, or information such as date and time, to a collaborative filter model?
We should use a tabular model, which is discussed in the next chapter!