Lesson 4 In-Class Discussion ✅

vernboy · November 14, 2018, 4:00am

If we’re going to go back through Tabular a bit - I’d be curious to know if some of the traditional gotchas in machine learning should be thought through prior to using the neural net - like skewed data distributions, highly correlated fields (covariance issues) etc. or is a neural net more robust against such data issues?

Shubhajit · November 14, 2018, 4:01am

Happy Birthday Jeremy

PierreO · November 14, 2018, 4:03am

Right. @rohitr seemed to suggest that collab filtering was a tabular special case for sparse data, I guess we’ll know for sure when Jeremy answers after the break.

PegasusWithoutWinds · November 14, 2018, 4:06am

Well, I suppose collaborative filtering is suitable for any sparse dataset, not just sales. Sale is a common example as it is one of the most common use scenarios. But the more general metaphor here is probably recommendation.

Let me think of another example.

How about recommending workout move based on the moves you have tried and rated?

There might be other properties of the datasets special enough to warrant a different kind of learner or learning method. Let me know if you find any.

rachel · November 14, 2018, 4:07am

Jeremy will answer this now.

deepanshu2017 · November 14, 2018, 4:08am

@rachel This question have around 17 likes

Jess · November 14, 2018, 4:08am

Agree re: Google sheets!

niranjanmudhiraj · November 14, 2018, 4:09am

Which Similarity Measure works better with Collaborative Filtering Algorithm?

paul · November 14, 2018, 4:09am

In brief: one big conceptual difference is that in collab filtering you only have two types of entities (say users and movies) and predict a third (say rating), whereas in tabular data more generally you could have an arbitrary number of different entities, (often tens, hundreds, even thousands) and you predict something else. So collab filtering data is a sparse matrix and each value has a similar meaning but for a different set of entities (user/movie); you can use the partial information to make the sparse matrix become a dense matrix. Tabular data often is dense already, and the entries in each column have different types of meaning.

dotkay · November 14, 2018, 4:09am

my brother who works in finance says ms-excel is the most intuitive programming framework

itsmuriuki · November 14, 2018, 4:09am

Both data is tabular if we define tabular as structured data. for collaborative filtering, our goal will be to find some user-movie combination.

nbharatula · November 14, 2018, 4:09am

So it seems like a model for tabular data and perhaps even colab filtering can learn more than what you intend for it to. For example, it might conclude there is a correlation between a “relationship” status and age, which may lead to bias. (I can think of more damaging examples of this from the tabular modeling data Jeremy used).

Am I correct in this understanding? If yes, is there a way to determine what all a model is learning on the side?!

sgugger · November 14, 2018, 4:13am

Audio is fine here.

avatar · November 14, 2018, 4:13am

the size the random matrix is 5 because the movie rating is between [1 5] ? or just a hyper parameter?

sgugger · November 14, 2018, 4:13am

Just a hyper-parameter. This is the 50 in our previous model.

PegasusWithoutWinds · November 14, 2018, 4:13am

Looking into the blackbox of DNN, or, in short, interpretation, is an active area of research. However, I don’t know much about it. The model is becoming more interpretable for sure. Please look into it and possibly write about it. Love to read.

Edward · November 14, 2018, 4:15am

This is just a great visualization (the excel spreadsheet)!

ArchieIndian · November 14, 2018, 4:15am

How is the loss calculated for blank cells in CF?

PegasusWithoutWinds · November 14, 2018, 4:15am

If collaborative filtering is simply dot product or matrix multiplication, then it assume a linear relationship. Is it possible that adding some non-linearity to it would improve the model?

rohitr · November 14, 2018, 4:16am

In tabular problems, the outcome is a predicted value. You look at a row and predict a value for that row. Like predict a recommended movie for a user.
In collaborative filtering, there is a relationship going both ways, you can recommend a movie for a user, and you can also predict users which will like a movie. You reduce your movie space to a smaller dimension of movie characteristics (implicitly generated) and at the same time the user space is reduced to a smaller dimension of user characteristics (again implicitly generated). It effectively uses this compression along both dimensions to find a relationship.
Looking at movie predictions as a tabular problem, sparsity is definitely a major issue. Very few users watch enough movies in common to make a model out of it, and that model will not be applicable to recommend movies to a user who has watched a different set of movies. You could compress the movies columns, but you have not utilized compression along the user dimension.