Lesson 7 - Official topic

You could try bayesian optimization to try an estimate a weight decay. Run the model once to get a prior if the experiment has never been done here by @muellerzr

If you want to check your answers or are struggling with questions in the chapter 8 questionnaire, check out this wiki! Feel free to contribute as well!

1 Like

Why the range goes from 0, 5.5 if the maximum rating is 5?

1 Like

They found this is empirically better.

so last week i had asked if NN models for Collab Filtering work better than Matrix Decomposition (SVD etc) based ones for recommendation system / applications in the real world. Any pointers on that ?

If we want to consider ā€œside informationā€ in a recommendation engine? can we add that context as an additional embedding layer in the shown setup? or how should we approach that?
Long gone are the days where just implicit or explicit information is enough to make worth while recommendations.

3 Likes

Thanks. This was what I was trying to reference with ā€œsecond-order associated preferences.ā€ I guess Iā€™m wondering whether a movie was rated or not could be an input into the model

2 Likes

Since youā€™re applying a Sigmoid function, you will asymptote between the prediction range. Thus, 5.0 and 0.5 will never be predicted (the lowest and highest scores) since they become exponentially difficult to reach. Changing the range from 0 to 5.5 empirically works better, although I do wonder if you could do the same with 0.3 and 5.2, for example.

4 Likes

When running the first cell from chapter 9, I got the following error No module named 'kaggle'
I am guessing that one needs to do ```pip install kaggle`` or use conda?

You will need a tabular model, which is what the lesson is about now :slight_smile:

2 Likes

Can the outcome variable be something like event will happen or not (like categorical, yes/no, or maybe probability of the event happening, like 80% likely to happen), instead of sales, which is a number, i.e. sales in USD?

we are considering the L2 distance between the data points to determine the embeddings, after we reduce the number of dimensions with pca, will the embeddings differ when we use all the dimensions and that will probably give us a better picture of the data?

1 Like

This is explained later in the notebook, but yes.

1 Like

Yes, it certainly can.

1 Like

Are embeddings used only for highly cardinal categorical variables or is the approach used in general? For low cardinality can one use a simple one-hot encoding approach?

6 Likes

I donā€™t know if one is better than the other. You could try an experiment to compare. I find non-negative matrix factorization a good entry point and quite explainable.

1 Like

In your experience, for highly imbalanced data (such as fraud or medical data) what usually works better Random Forests, XGBoost or NNs?

7 Likes

Can you factorize a sparse matrix via SGD to come up with embeddings?

Ok, so for example, Iā€™d like to predict that a very expensive item sale will happen or not. I will have a bunch of variables and embeddings for situations when the sales happened and when the customers got spooked for example. Will I be able to get the outcome like under these conditions the sale is 85% likely to happen?

There are a few approaches to encode categorical variables, see: https://github.com/scikit-learn-contrib/category_encoders . I suppose which approach works better depends on the data you are working with.