Lesson 7 - Official topic

radikubwa · April 29, 2020, 1:59am

You could try bayesian optimization to try an estimate a weight decay. Run the model once to get a prior if the experiment has never been done here by @muellerzr

ilovescience · April 29, 2020, 2:00am

If you want to check your answers or are struggling with questions in the chapter 8 questionnaire, check out this wiki! Feel free to contribute as well!

gerardo · April 29, 2020, 2:00am

Why the range goes from 0, 5.5 if the maximum rating is 5?

ilovescience · April 29, 2020, 2:00am

They found this is empirically better.

pinaki · April 29, 2020, 2:01am

so last week i had asked if NN models for Collab Filtering work better than Matrix Decomposition (SVD etc) based ones for recommendation system / applications in the real world. Any pointers on that ?

almeidajj · April 29, 2020, 2:01am

If we want to consider “side information” in a recommendation engine? can we add that context as an additional embedding layer in the shown setup? or how should we approach that?
Long gone are the days where just implicit or explicit information is enough to make worth while recommendations.

raphaelr · April 29, 2020, 2:02am

Thanks. This was what I was trying to reference with “second-order associated preferences.” I guess I’m wondering whether a movie was rated or not could be an input into the model

jwuphysics · April 29, 2020, 2:03am

Since you’re applying a Sigmoid function, you will asymptote between the prediction range. Thus, 5.0 and 0.5 will never be predicted (the lowest and highest scores) since they become exponentially difficult to reach. Changing the range from 0 to 5.5 empirically works better, although I do wonder if you could do the same with 0.3 and 5.2, for example.

mario_carrillo · April 29, 2020, 2:03am

When running the first cell from chapter 9, I got the following error No module named 'kaggle'
I am guessing that one needs to do ```pip install kaggle`` or use conda?

sgugger · April 29, 2020, 2:03am

You will need a tabular model, which is what the lesson is about now

kodzaks · April 29, 2020, 2:05am

Can the outcome variable be something like event will happen or not (like categorical, yes/no, or maybe probability of the event happening, like 80% likely to happen), instead of sales, which is a number, i.e. sales in USD?

0tist · April 29, 2020, 2:05am

we are considering the L2 distance between the data points to determine the embeddings, after we reduce the number of dimensions with pca, will the embeddings differ when we use all the dimensions and that will probably give us a better picture of the data?

sgugger · April 29, 2020, 2:05am

This is explained later in the notebook, but yes.

sgugger · April 29, 2020, 2:06am

Yes, it certainly can.

harish3110 · April 29, 2020, 2:08am

Are embeddings used only for highly cardinal categorical variables or is the approach used in general? For low cardinality can one use a simple one-hot encoding approach?

radikubwa · April 29, 2020, 2:08am

I don’t know if one is better than the other. You could try an experiment to compare. I find non-negative matrix factorization a good entry point and quite explainable.

zevarela · April 29, 2020, 2:09am

In your experience, for highly imbalanced data (such as fraud or medical data) what usually works better Random Forests, XGBoost or NNs?

bibsian · April 29, 2020, 2:09am

Can you factorize a sparse matrix via SGD to come up with embeddings?

kodzaks · April 29, 2020, 2:10am

Ok, so for example, I’d like to predict that a very expensive item sale will happen or not. I will have a bunch of variables and embeddings for situations when the sales happened and when the customers got spooked for example. Will I be able to get the outcome like under these conditions the sale is 85% likely to happen?

tanguyen14 · April 29, 2020, 2:10am

There are a few approaches to encode categorical variables, see: https://github.com/scikit-learn-contrib/category_encoders . I suppose which approach works better depends on the data you are working with.