Lesson 7 - Official topic

marii · April 29, 2020, 3:46am

Why normalize by subtracting the mean and dividing by the standard deviation? What about something like dividing by the maximum value, so that it ranged from 0 to 1?

jona · April 29, 2020, 3:46am

Is there any downside to "Normalize"ing continuous variable’s that have trends, not averages? (e.g. saleElapsed?)

jcatanza · April 29, 2020, 3:46am

Thanks @FraPochetti That is an excellent explanation!

sgugger · April 29, 2020, 3:47am

Models really like inputs that have a mean 0 and a std of 1.

ilovescience · April 29, 2020, 3:47am

I think there was a short discussion in last year’s course. I don’t remember of the top of my head what that heuristic was, but maybe check that lesson?

muellerzr · April 29, 2020, 3:48am

IIRC a gentleman out of South Africa (I need to find the paper) for his thesis explored the fastai tabular model. He found that 3 layers were ideal. I don’t quite recall the sizes, let me try to find it

harish3110 · April 29, 2020, 3:48am

If decision trees and random forests at the core are basically if-else statements then what’s the reason we normalize the data at the start? I don’t understand why this might help as this isn’t a distance based model like linear regression etc…

FraPochetti · April 29, 2020, 3:50am

Yes. It worked for me in the past.

Don’t think so. Weight decay is actually better as you keep the inner complexity of the model, due to all its non-linearities and let the learning process figure it out for you.

FraPochetti · April 29, 2020, 3:51am

It does not help at all for Random Forest.
Tree Ensembles are scale invariant.

It does help in case you are training a NN, in terms of learning stability.

jcatanza · April 29, 2020, 3:51am

You are correct. Standardizing the data (i.e. centering to zero the mean and re-scaling to get unit variance) is not necessary for tree-based models.

The reason is that decisions (splits) are based on inequalities, which compare relative feature values and are agnostic to scale and mean.

muellerzr · April 29, 2020, 3:53am

Found it: Deep learning for tabular data: an exploratory study by Jan-Andre Marais. Published in 2019. He went into great detail trying out various hyperparameters and techniques with the fastai model

Fair warning it’s a hefty read (144 pages!). The bits we care about are from pages 62 to 85. Everything from attention to self normalization to unsupervised pretraining were attempted or discussed

harish3110 · April 29, 2020, 3:54am

What are the other fancy methods one can use to ensemble a rf model and a nn tabular model other than averaging the results?

victor.vargas · April 29, 2020, 3:54am

Can you talk a little bit about the COVID-19 efforts at the end if possible? Would love to know what is the latest.

ilovescience · April 29, 2020, 3:55am

Here is the questionnaire solutions if you want to check your answers or are struggling (work in progress ):

jona · April 29, 2020, 3:56am

Jeremy’s mic is dropping very infrequently.
Specifically about what to expect next week?

tonibagur · April 29, 2020, 3:56am

Question: Could we use the boosting technique with a neural net?

mlabs · April 29, 2020, 3:56am

jeremy’s audio is now cutting out … choppy…

steef · April 29, 2020, 3:57am

Thanks Rachel & Jeremy!

Next week NLP?

MaheshKhatri · April 29, 2020, 3:57am

Since Entity Embeddings was covered extensively in today’s class, sharing my Medium article with a link to my talk on the same at NY ML community last year.

giacomov · April 29, 2020, 3:57am

You’ve been waiting for 7 whole weeks now!