MovieLens with CrossEntropyLoss

In lesson 8 (collaborative filtering) further research we were asked this question:

Create a model for MovieLens which works with CrossEntropy loss, and compare it to the model in this chapter.

As expected, this does not work just by changing the loss function, is the idea here to predict an integer between 0 and 5? What about the .5 ratings? Does this make any sense? Can I get any tips on how to do that? I imagine I have to change the DataLoaders so the y parameter is a tensor of 5(probabilities). And also a change in the forward function inside the DotProduct module.

Hey,

I believe the high-level idea here is to treat this problem as a classification problem, so ignoring treating the rating as non-ordinal and non-continuous. Then this problem becomes multi-class classification, and the results of your model would go into a softmax layer with X outputs where X is the number of classes. You can then calculate accuracy and see how well the model performs in this classification problem.

You should try to implement this yourself as I believe itā€™s a nice opportunity to fiddle with things manually and see what happens, let me know if you need any help :slight_smile:

Hi orendar, thanks for your response!
So what classes would you use here? I didnā€™t quite understand what you meant here:

Also the only model creation Iā€™ve seen are the mnist(Lesson 4) and the one on the lesson 8, so I have some doubtsā€¦ To change the layer output I have to edit the forward return value, right?So in this case should be a tensor of size [#classes, 1]? Another question I have is whether i need to apply the softmax myself or not, since the CELoss already applies it.

Am I right?

Hey,

Sorry, I meant ā€œignoring the order and treating the rating as non-ordinal and non-continuous.ā€ So instead of predicting a continuous number between 1 and 5, you are predicting a class (for example here we could have 9 classes: 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5). You would need to treat the ratings as class labels, and therefore the task would now be multiclass classification.

Regarding the loss - it depends on the loss function you use, as Jeremy explains in the lesson. If you use a loss function which already applies the softmax, then you just need 10 outputs from a fully-connected layer, otherwise you also need to add a softmax layer yourself.

Hello again,
so Iā€™ve been working on it and I think Iā€™ve got something but I canā€™t find a way to make my dataloaders y.shape of [64, 9]. The shape of my dls is:

dls = CollabDataLoaders.from_df(ratings, item_name=ā€˜titleā€™, bs=64)
x,y = dls.one_batch()
x.shape,y.shape
=
(torch.Size([64, 2]), torch.Size([64, 1]))

What I did manage to do was to make my forward function return a tensor of size [64,9], meaning batch size and number of classes.

Running the trainer I get the following error

model = DotProductBias(n_users, n_movies, 50)
learn = Learner(dls, model, loss_func=CrossEntropyLossFlat)
learn.fit_one_cycle(5, 5e-3)

RuntimeError: Boolean value of Tensor with more than one value is ambiguous

I will share my notebook in case it helps:

PD: I donā€™t really know if what I have done is right or any useful but it should work, I think.

@veci did you try using neural nets instead of the dotproductbias? I tried crossentropyloss withh NNs however the results werenā€™t good. 40-50% accuracy only

I treated it as a classification problem with 5 classes and did softmax so as to get 1 class with the highest probability.

This is the architecture I used (quite a simple it is):

self.layers = nn.Sequential(
nn.Linear(user_sz[1]+item_sz[1], n_act),
nn.ReLU(),
nn.Linear(n_act, 5),
nn.Softmax())

The accuracy I am getting is around 43-45%, LR is used is 5e-3.
I choose only 5 classes and didnā€™t include decimals like 1.5, 2.5, because there were only integer values in the training set.

1 Like

Hi, how did you do the comparison part, i.e. where we compare CELoss model with MSELoss, I mean one is the regression model, the other is the classification model, and how to compare these two Iā€™m wondering. :thinking:

That is a good question! Itā€™s up to you to think up a creative answer - for example, treating the classifier predictions as continuous and calculating regression metrics over them, or alternatively binning the regressor predictions and calculating classification metrics over them.

I had that error many times before. I believe you just have to change CrossEntropyLossFlat to CrossEntropyLossFlat() when declaring the loss funtion in the learner. Hopefully this helps.

Hi everyone. Iā€™m trying to create a model for MovieLens that works with Cross-Entropy loss, but Iā€™m getting grad can be implicitly created only for scalar outputs error.
Could you help, what am I doing wrong?

I have 5 categories (for each rating): 1,2,3,4,5.
Iā€™m using nn.CrossEntropyLoss(reduction='none') with this model:

class CollabClassification(Module):
    def __init__(self, users_sz, movies_sz, n_factors = 100):
        self.user_factors = Embedding(*users_sz)
        self.movie_factors = Embedding(*movies_sz)
        self.layers = nn.Sequential(
            nn.Linear(users_sz[1] + movies_sz[1], n_factors),
            nn.ReLU(),
            nn.Linear(n_factors, 5)
        )

    def forward(self, x):
        users = self.user_factors(x[:,0])
        movies = self.movie_factors(x[:,1])
        return self.layers(torch.cat((users, movies), dim=1))

Here is a link to the full .ipynb file on Google Colab:

Hey, I think you have to remove reduction=ā€˜noneā€™ on CrossEntropy

also it is worth noting that settings embedding size to len(ratings.user.unique()) works only if you donā€™t have ā€œholesā€ in user ids (same for movies). In this case it works though, but it is much safer to use TabularCollab to get the dataloaders.

Thanks, reduction='none' helped. But valid_loss turned out to be too high compared to DotProductBias model from the chapter: 1.238229. I hope I can improve it somehow.

And thanks for the TabularCollab advice, Iā€™ll try to use it :+1:

yes, even better to use CrossEntropyLossFlat that will work if shape of y is [64] or if is [64,1] so you donā€™t need to squeeze the latter.
Results are different because the model of chapter is a regression, now you have a classification. Maybe it would be better if you had an ā€œorderedā€ classification but not sure how to exactly do that with a neural network. I mean that with a simple cross entropy if actual class is ā€˜5ā€™ , having 0.8 confidence that is a ā€˜1ā€™ has the same loss having 0.8 confidence that is a ā€˜4ā€™, even if a ā€˜4ā€™ prediction would be much better.