Is there a theoretical reason for why we should have better chances at correctly imputing missing values for some (user_id, movie_id) combinations with matrix factorization and similar techniques vs. just building a regression model with ratings as target and (user_id, movie_id, other_features) as right hand side variables?

The regression-based approach would unlock the opportunity of using as regressors explicitly (user’s age, income, etc., movie genre, budget, etc.) and implicitly (user and movie embeddings). I guess that would be awkward to do in the context of matrix factorization.

@gballardin We cannot use user_id, movie_id as inputs to a regression model, as a higher ID does not necessarily correspond to *more* user-ness or anything like that. Linear regression assumes a continuous relationship between the predictors and target, so as we increase the value of one predictor, then it has a predictable impact on the target. This is not possible when our predictor variables are IDs which have no ordinality. Using the embedding layer allows us to represent each user as a vector of latent factors which capture their viewing tendencies.

Correct, I was suggesting to use user_id and movie_id *embeddings* as regressors and not the ids themselves. We’re on the same page on that.

I hinted at using *regression*, and not specifically *linear regression*, which still has the ability to model non-linear discontinuities. Think of using a tree-based regression, for example.

Ok cool, I’m with you. But how would we learn those embeddings if not using a NN?

You would in fact use a NN to calculated the embeddings in the first step. As a second step you’d use your ML algo of choice to predict ratings, with embeddings from the first step as features. That is the approach I am suggesting in alternative to matrix factorization.

1 Like

Hey Giorgio @gballardin, I think this is a valid approach. Afaik user and item embeddings are in fact often used to predict a user/item match. I have seen it in a classification case (predict a probability directly or scale up to the ratings space afterwards) but I see no reason why we shouldn’t be able to use the same model for a regression (with a different output activation function).

See for example this slide (source).

They do exactly what you suggest:

- use deep learning to learn embeddings
- concatenate the relevant embeddings and use as input for a ranking model

Have you continued working on this and had any interesting results?

1 Like

I have not pursues this approach further. I am trying to learn too many things at once, or so it seems. However this approach keeps coming back to me, so as soon as I have a good use case for it, I’ll give it another crack.

Funny that the link you posted is about eBay Tech. I used to work at eBay.

1 Like