You could try bayesian optimization to try an estimate a weight decay. Run the model once to get a prior if the experiment has never been done here by @muellerzr
If you want to check your answers or are struggling with questions in the chapter 8 questionnaire, check out this wiki! Feel free to contribute as well!
Why the range goes from 0, 5.5 if the maximum rating is 5?
They found this is empirically better.
so last week i had asked if NN models for Collab Filtering work better than Matrix Decomposition (SVD etc) based ones for recommendation system / applications in the real world. Any pointers on that ?
If we want to consider āside informationā in a recommendation engine? can we add that context as an additional embedding layer in the shown setup? or how should we approach that?
Long gone are the days where just implicit or explicit information is enough to make worth while recommendations.
Thanks. This was what I was trying to reference with āsecond-order associated preferences.ā I guess Iām wondering whether a movie was rated or not could be an input into the model
Since youāre applying a Sigmoid function, you will asymptote between the prediction range. Thus, 5.0 and 0.5 will never be predicted (the lowest and highest scores) since they become exponentially difficult to reach. Changing the range from 0 to 5.5 empirically works better, although I do wonder if you could do the same with 0.3 and 5.2, for example.
When running the first cell from chapter 9, I got the following error No module named 'kaggle'
I am guessing that one needs to do ```pip install kaggle`` or use conda?
You will need a tabular model, which is what the lesson is about now
Can the outcome variable be something like event will happen or not (like categorical, yes/no, or maybe probability of the event happening, like 80% likely to happen), instead of sales, which is a number, i.e. sales in USD?
we are considering the L2 distance between the data points to determine the embeddings, after we reduce the number of dimensions with pca, will the embeddings differ when we use all the dimensions and that will probably give us a better picture of the data?
This is explained later in the notebook, but yes.
Yes, it certainly can.
Are embeddings used only for highly cardinal categorical variables or is the approach used in general? For low cardinality can one use a simple one-hot encoding approach?
I donāt know if one is better than the other. You could try an experiment to compare. I find non-negative matrix factorization a good entry point and quite explainable.
In your experience, for highly imbalanced data (such as fraud or medical data) what usually works better Random Forests, XGBoost or NNs?
Can you factorize a sparse matrix via SGD to come up with embeddings?
Ok, so for example, Iād like to predict that a very expensive item sale will happen or not. I will have a bunch of variables and embeddings for situations when the sales happened and when the customers got spooked for example. Will I be able to get the outcome like under these conditions the sale is 85% likely to happen?
There are a few approaches to encode categorical variables, see: https://github.com/scikit-learn-contrib/category_encoders . I suppose which approach works better depends on the data you are working with.