Office hours Nov 27th at 5:30pm

miguel_perez · November 30, 2017, 12:04pm

Its a very interesting topic, yes, you dont have those kind of markers, of course. But the way I think about lattent factors is that they find (through SGD) some meaningful aspects of the movies. This can sometimes (only sometimes) be mapped to human-readable aspects, like my silly example of “romantic” factor.

What I meant is that this factors meaning is easier to be though of that way (its also the way Andres N.G. approaches de topic in his ML course, saying something like…“if we only could know what this aspects are -accion, romantic, length of movie, good music, whatever…- , if we only knew,… but we dont’t, so lets make SGD learn them for us”.

sermakarevich · November 30, 2017, 12:23pm

Lets consider this theoretical example.
You have matrix1 5x5 where 5 users rated 5 movies. Also you have matrix 2 where other 5 users rated other 5 movies. No intersection between these matrixes but matrix 1 == matrix2 in terms of ratings even though they are unrelated.

No lets join these two matrixes into matrix 10x10 which has 50% density (no ratings for users from matrix 1 and movies from matrix 2 and visa versa). Lets initialise exactly the same weights for each movie in movies and for each user in users. Lets build a model and I bet we got the same latent factors for unrelated users rated unrelated movies similarly. Does this mean users are similar? - no. Does this mean movies are similar? - no. What happened? These latent factors are meaningless for unrelated ratings. The higher intersection of users/movies/ratings you have, the better latent factors you get. Unrelated rankings happens when low-active users ranks not popular movies.

miguel_perez · November 30, 2017, 12:45pm

Your point is very good, and also I would say that it is correct. That said, I think that it misses something:

I completely agree that the more interconections the more meaningful the factors, but, taking your example to a mental experiment (not real, but hope useful). Imagine we live in a world where some kind of movies and only some kind of them, ok, lets say action movies will give 1.000 dollars for every person that makes a positive rating.

In that world (lets guess) highest rated movies will be action movies. In your example, if the “ground truth” of ratings includes that action movies are the top rated both unrelated group of persons/movies will learn as the main factor the “action-ness” of the movie. So I think even in that case there would be a shared latent factor.

I think this also can happen to some extent in real examples. Anyway, as far as I know, your critic to matrix factorizacion for recommender systems is THE big critic , that is, they are limited and damaged by sparsity, and less useful for users with less ratings (and not useful at all for users with 0 ratings)

radek · November 30, 2017, 12:54pm

The bias term makes collaborative filtering useful for users with 0 ratings If you are to recommend someone a movie, even if you do not know anything about their preferences, better suggest them a good one that many other people agree to be interesting!

miguel_perez · November 30, 2017, 1:05pm

@radek yeah, this its definitely your kind of mental playground!

Thanks for the hint, I had completely overlooked that huge point!

radek · November 30, 2017, 1:10pm

I don’t think this is 100%accurate and I think this is what @miguel_perez was also getting at.

In your small example that likely is correct, but I suspect that the more data and the bigger our model grows, the less clear the situation becomes. Meaning, even if we have perfectly unrelated movie / user groups, still the meaning of latent factors will be to some extent parametrized by the model.

If we assume that there are some tastes people share globally (some like old / romantic movies, others action movies / w sfx) and that those relationships are distinct enough among themselves then if our model will be trained on large enough datasets, I think it might be imposing latent factor 1 to mean this and latent factor 2 to mean that across groups.

Say there are 3 latent factors and each groups of people (as they share tastes) combine latent factor A + B to produce score (one taste) and latent factor B * C to produce score for people with different taste - and only these two relationships exist. The I suspect that with enough data - assuming those tastes are universal across groups and with model of just right capacity, our model will learn to assign A, B, and C features to same latent factors, hence they will share meaning across disjoint groups.

Of course, I am not sure if this is the case though would be fun to try this

sermakarevich · November 30, 2017, 1:20pm

I think we are on the same page. Probably I had too high expectations about latent factors. They work, work well, but they need some minimal density level (art). After I increased density from 0.3 to 2.5% by filtering out user/movies with low number of ratings (in my case sites/ad campaigns), I saw the MATRIX:

left side - very good sites for every campaign
right side - very bad sites for every camapign

jeremy · November 30, 2017, 3:08pm

It’s not a conceptual comparison, it’s a literal identity. They are actually the exact same thing: OHE * weight matrix == embedding. Try a small example in a spreadsheet or on paper and I think it’ll be clear. If so - tell us what you learn! If not - please ask some more!