It does not help at all for Random Forest.
Tree Ensembles are scale invariant.
It does help in case you are training a NN, in terms of learning stability.
It does not help at all for Random Forest.
Tree Ensembles are scale invariant.
It does help in case you are training a NN, in terms of learning stability.
You are correct. Standardizing the data (i.e. centering to zero the mean and re-scaling to get unit variance) is not necessary for tree-based models.
The reason is that decisions (splits) are based on inequalities, which compare relative feature values and are agnostic to scale and mean.
Found it: Deep learning for tabular data: an exploratory study by Jan-Andre Marais. Published in 2019. He went into great detail trying out various hyperparameters and techniques with the fastai model
Fair warning it’s a hefty read (144 pages!). The bits we care about are from pages 62 to 85. Everything from attention to self normalization to unsupervised pretraining were attempted or discussed
What are the other fancy methods one can use to ensemble a rf model and a nn tabular model other than averaging the results?
Can you talk a little bit about the COVID-19 efforts at the end if possible? Would love to know what is the latest.
Here is the questionnaire solutions if you want to check your answers or are struggling (work in progress ):
Jeremy’s mic is dropping very infrequently.
Specifically about what to expect next week?
Question: Could we use the boosting technique with a neural net?
jeremy’s audio is now cutting out … choppy…
Thanks Rachel & Jeremy!
Next week NLP?
Since Entity Embeddings was covered extensively in today’s class, sharing my Medium article with a link to my talk on the same at NY ML community last year.
You’ve been waiting for 7 whole weeks now!
Jeremy’s final words during today’s course: Next week NLP & Computer Vision
Thanks Jeremy and Rachel
Thanks for another great lecture! Interesting to learn more about some “traditional” ML techniques like random forests!
Hahaha - yes I have
Excited it’s coming next week
Thanks Jeremy, Rachel & Sylvain.
Can someone please point to what entity embedding mean from the last bit of the lesson? Is it the one hot encoding that we discussed or something else?
Thanks, see you all next week!
The enhancements that make Random Forest
such a powerful Decision Tree
model are:
Bootstrap sampling
, which is selecting a random subset of the data (i.e. rows in the data table) to construct each decision tree, andEnsembling
, which is contructing a group of models (in this case, a ‘forest’ of trees) and averaging their votes to make each final classification.The first two of these enhancements are analogous in their application and their effect to the Dropout
technique in Neural Networks
.