Lesson 7 - Official topic

My understanding is that it could. If you liked a type of movie, say romance, it could look at other users that liked those category of movies and infer that they also disliked horror. Thus, you might also dislike horror

So a high bias indicates the possibility of a user liking a movie despite it being different from his general preferences?

2 Likes

We can surely look at past history of sales but since there is time aspect to the sales, can we build such a user item matrix based on an average past sales?

Exactly (or conversely a low bias means someone will probably dislike it, despite their preferences).

4 Likes

You could try bayesian optimization to try an estimate a weight decay. Run the model once to get a prior if the experiment has never been done here by @muellerzr

If you want to check your answers or are struggling with questions in the chapter 8 questionnaire, check out this wiki! Feel free to contribute as well!

1 Like

Why the range goes from 0, 5.5 if the maximum rating is 5?

1 Like

They found this is empirically better.

so last week i had asked if NN models for Collab Filtering work better than Matrix Decomposition (SVD etc) based ones for recommendation system / applications in the real world. Any pointers on that ?

If we want to consider “side information” in a recommendation engine? can we add that context as an additional embedding layer in the shown setup? or how should we approach that?
Long gone are the days where just implicit or explicit information is enough to make worth while recommendations.

3 Likes

Thanks. This was what I was trying to reference with “second-order associated preferences.” I guess I’m wondering whether a movie was rated or not could be an input into the model

2 Likes

Since you’re applying a Sigmoid function, you will asymptote between the prediction range. Thus, 5.0 and 0.5 will never be predicted (the lowest and highest scores) since they become exponentially difficult to reach. Changing the range from 0 to 5.5 empirically works better, although I do wonder if you could do the same with 0.3 and 5.2, for example.

4 Likes

When running the first cell from chapter 9, I got the following error No module named 'kaggle'
I am guessing that one needs to do ```pip install kaggle`` or use conda?

You will need a tabular model, which is what the lesson is about now :slight_smile:

2 Likes

Can the outcome variable be something like event will happen or not (like categorical, yes/no, or maybe probability of the event happening, like 80% likely to happen), instead of sales, which is a number, i.e. sales in USD?

we are considering the L2 distance between the data points to determine the embeddings, after we reduce the number of dimensions with pca, will the embeddings differ when we use all the dimensions and that will probably give us a better picture of the data?

1 Like

This is explained later in the notebook, but yes.

1 Like

Yes, it certainly can.

1 Like

Are embeddings used only for highly cardinal categorical variables or is the approach used in general? For low cardinality can one use a simple one-hot encoding approach?

6 Likes

I don’t know if one is better than the other. You could try an experiment to compare. I find non-negative matrix factorization a good entry point and quite explainable.

1 Like