In lesson 8 (collaborative filtering) further research we were asked this question:
Create a model for MovieLens which works with CrossEntropy loss, and compare it to the model in this chapter.
As expected, this does not work just by changing the loss function, is the idea here to predict an integer between 0 and 5? What about the .5 ratings? Does this make any sense? Can I get any tips on how to do that? I imagine I have to change the DataLoaders so the y parameter is a tensor of 5(probabilities). And also a change in the forward function inside the DotProduct module.
I believe the high-level idea here is to treat this problem as a classification problem, so ignoring treating the rating as non-ordinal and non-continuous. Then this problem becomes multi-class classification, and the results of your model would go into a softmax layer with X outputs where X is the number of classes. You can then calculate accuracy and see how well the model performs in this classification problem.
You should try to implement this yourself as I believe itās a nice opportunity to fiddle with things manually and see what happens, let me know if you need any help
Hi orendar, thanks for your response!
So what classes would you use here? I didnāt quite understand what you meant here:
Also the only model creation Iāve seen are the mnist(Lesson 4) and the one on the lesson 8, so I have some doubts⦠To change the layer output I have to edit the forward return value, right?So in this case should be a tensor of size [#classes, 1]? Another question I have is whether i need to apply the softmax myself or not, since the CELoss already applies it.
Sorry, I meant āignoring the order and treating the rating as non-ordinal and non-continuous.ā So instead of predicting a continuous number between 1 and 5, you are predicting a class (for example here we could have 9 classes: 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5). You would need to treat the ratings as class labels, and therefore the task would now be multiclass classification.
Regarding the loss - it depends on the loss function you use, as Jeremy explains in the lesson. If you use a loss function which already applies the softmax, then you just need 10 outputs from a fully-connected layer, otherwise you also need to add a softmax layer yourself.
Hello again,
so Iāve been working on it and I think Iāve got something but I canāt find a way to make my dataloaders y.shape of [64, 9]. The shape of my dls is:
@veci did you try using neural nets instead of the dotproductbias? I tried crossentropyloss withh NNs however the results werenāt good. 40-50% accuracy only
The accuracy I am getting is around 43-45%, LR is used is 5e-3.
I choose only 5 classes and didnāt include decimals like 1.5, 2.5, because there were only integer values in the training set.
Hi, how did you do the comparison part, i.e. where we compare CELoss model with MSELoss, I mean one is the regression model, the other is the classification model, and how to compare these two Iām wondering.
That is a good question! Itās up to you to think up a creative answer - for example, treating the classifier predictions as continuous and calculating regression metrics over them, or alternatively binning the regressor predictions and calculating classification metrics over them.
I had that error many times before. I believe you just have to change CrossEntropyLossFlat to CrossEntropyLossFlat() when declaring the loss funtion in the learner. Hopefully this helps.
Hi everyone. Iām trying to create a model for MovieLens that works with Cross-Entropy loss, but Iām getting grad can be implicitly created only for scalar outputs error.
Could you help, what am I doing wrong?
I have 5 categories (for each rating): 1,2,3,4,5.
Iām using nn.CrossEntropyLoss(reduction='none') with this model:
also it is worth noting that settings embedding size to len(ratings.user.unique()) works only if you donāt have āholesā in user ids (same for movies). In this case it works though, but it is much safer to use TabularCollab to get the dataloaders.
Thanks, reduction='none' helped. But valid_loss turned out to be too high compared to DotProductBias model from the chapter: 1.238229. I hope I can improve it somehow.
And thanks for the TabularCollab advice, Iāll try to use it
yes, even better to use CrossEntropyLossFlat that will work if shape of y is [64] or if is [64,1] so you donāt need to squeeze the latter.
Results are different because the model of chapter is a regression, now you have a classification. Maybe it would be better if you had an āorderedā classification but not sure how to exactly do that with a neural network. I mean that with a simple cross entropy if actual class is ā5ā , having 0.8 confidence that is a ā1ā has the same loss having 0.8 confidence that is a ā4ā, even if a ā4ā prediction would be much better.