Lesson 6 - Official topic

imrandude · April 22, 2020, 3:28am

To put in simpler terms, Tabular dataset, you have a Xs and ys. In collaborative filtering, you have sort a table for each candidate with holes to be filled on the table. It doesn’t necessarily need to have categorical variables.

giacomov · April 22, 2020, 3:28am

Never found a proper way. If I can’t restart from scratch and just use more epochs in fit_one_cycle, I usually keep going with fit and a small learning rate, instead of fit_one_cycle. But I’d love more guidance here.

champs.jaideep · April 22, 2020, 3:30am

yes i also tend to adopt similar approach,just that i also reduce pct start to say 0.1 so to not overshoot previous learning points again

chengwliu · April 22, 2020, 3:30am

Question: what if there are some “super popular” movies that every one watches. Does that affect how we train a collaborative filtering model?

sgugger · April 22, 2020, 3:31am

Jeremy will answer that question a bit later

vijayabhaskar · April 22, 2020, 3:32am

have you looked at Resume training with fit_one_cycle this is for fastaiv1 but the API is the same I guess.

steef · April 22, 2020, 3:33am

Just to answer my own question and for reference for others who might not have gotten to it by reviewing today’s course yet: Jeremy today said that he will touch on NLP in one of the following lessons. Yay!

Dina · April 22, 2020, 3:35am

By adding a channel to an image, I am referring to something like what Jermey mentioned as encoding time in the image frame. Basically handling cases in which you have an image and some other vector of numbers that you want to feed into the model

giacomov · April 22, 2020, 3:35am

Thanks for the pointer, but it’s not the same problem. Here we are talking about having completed successfully an entire fit_one_cycle run, but realizing that it was not long enough (say maybe train loss and validation loss were still decreasing and accuracy was still going up). So now what do we do? Of couse we can restart from scratch with a larger number of epochs, but for big models this is very time consuming. So, is there a way to “keep going” without having to restart?

What I’m doing is to “keep going” with fit and a small lr, instead of fit_one_cycle. Another possibility, would be to use another fit_one_cycle with a very low lr.

sfyash · April 22, 2020, 3:35am

In the original dataset, there are a number of blank ratings. Does the loss function assume these are ‘zero’? If so, why does that not matter?

sgugger · April 22, 2020, 3:36am

No those blank ratings are the things we have to predict.

jona · April 22, 2020, 3:36am

did he go mute?
(only briefly, after discussing the number of factors)
Very little content was actually muted.

Nonnormalizable · April 22, 2020, 3:36am

How different in practice is collaborative filtering with sparse data, compared to (comparatively) dense data?

wittmannf · April 22, 2020, 3:37am

@sgugger I will rewrite my question because it felt it was stupid the way I asked, but it is not relevant to the class anymore, I might open a topic later (after trying in the bear classification). Feel free if you want to add something about that. I am interested on a more generic matter, not just for the bear classification. My main question is: By using multi-label classification can we finally have an AI that can acknowledge when it does not know something? This is something very relevant… We are used to have a limited set of classes. But when the input does not match any of those classes, can AI really acknowledge that? What’s the best practices? To use multi-label classification or to add a “neutral” class to be classified when it does not match any of the other classes?

Raymond-Wu · April 22, 2020, 3:37am

Sounds fine to me

tanguyen14 · April 22, 2020, 3:38am

In practice, do we tune number of latent factors ?

vijayabhaskar · April 22, 2020, 3:38am

In that case I usually do normal fit with a smaller lr for some epochs with SaveModelCallback to save the best one.

jcatanza · April 22, 2020, 3:39am

Those blank ratings are predicted.

The product of the two embedding matrices (i.e. the user factors to the left of the crosstab matrix and the movie factors above it – Jeremy showed these in orange and light blue in the Excel spreadsheet) has the same dimension as the crosstab matrix, and all the blank positions are filled in!

So we take this matrix product as an estimate of the crosstab matrix, then improve this estimate by iteratively adjusting the entries in the embedding matrices so as to minimize the error with the crosstab matrix.

The blank entries in the product matrix are estimated but they are not counted when we compute the error.

So at the end of the iteration process, viola! We’ve estimated the entries in the blank positions of the crosstab matrix!

imrandude · April 22, 2020, 3:39am

This is possible and covered in this class as well. Try this, train a bear classifier with BCEWithLogitLoss and pass a non-bear image to model to predict. See what happens.

Tamori · April 22, 2020, 3:40am

Have researchers ever tried to match latent factors obtained from collaborative filtering with values obtained with content-based learning (i.e. for films from the video data)?