Why normalize by subtracting the mean and dividing by the standard deviation? What about something like dividing by the maximum value, so that it ranged from 0 to 1?
Is there any downside to "Normalize"ing continuous variableâs that have trends, not averages? (e.g. saleElapsed?)
Thanks @FraPochetti That is an excellent explanation!
Models really like inputs that have a mean 0 and a std of 1.
I think there was a short discussion in last yearâs course. I donât remember of the top of my head what that heuristic was, but maybe check that lesson?
IIRC a gentleman out of South Africa (I need to find the paper) for his thesis explored the fastai tabular model. He found that 3 layers were ideal. I donât quite recall the sizes, let me try to find it
If decision trees and random forests at the core are basically if-else statements then whatâs the reason we normalize the data at the start? I donât understand why this might help as this isnât a distance based model like linear regression etcâŚ
Yes. It worked for me in the past.
Donât think so. Weight decay is actually better as you keep the inner complexity of the model, due to all its non-linearities and let the learning process figure it out for you.
It does not help at all for Random Forest.
Tree Ensembles are scale invariant.
It does help in case you are training a NN, in terms of learning stability.
You are correct. Standardizing the data (i.e. centering to zero the mean and re-scaling to get unit variance) is not necessary for tree-based models.
The reason is that decisions (splits) are based on inequalities, which compare relative feature values and are agnostic to scale and mean.
Found it: Deep learning for tabular data: an exploratory study by Jan-Andre Marais. Published in 2019. He went into great detail trying out various hyperparameters and techniques with the fastai model
Fair warning itâs a hefty read (144 pages!). The bits we care about are from pages 62 to 85. Everything from attention to self normalization to unsupervised pretraining were attempted or discussed
What are the other fancy methods one can use to ensemble a rf model and a nn tabular model other than averaging the results?
Can you talk a little bit about the COVID-19 efforts at the end if possible? Would love to know what is the latest.
Here is the questionnaire solutions if you want to check your answers or are struggling (work in progress ):
Jeremyâs mic is dropping very infrequently.
Specifically about what to expect next week?
Question: Could we use the boosting technique with a neural net?
jeremyâs audio is now cutting out ⌠choppyâŚ
Thanks Rachel & Jeremy!
Next week NLP?
Since Entity Embeddings was covered extensively in todayâs class, sharing my Medium article with a link to my talk on the same at NY ML community last year.
Youâve been waiting for 7 whole weeks now!