Lesson 7 - Official topic

Why normalize by subtracting the mean and dividing by the standard deviation? What about something like dividing by the maximum value, so that it ranged from 0 to 1?

1 Like

Is there any downside to "Normalize"ing continuous variable’s that have trends, not averages? (e.g. saleElapsed?)

1 Like

Thanks @FraPochetti That is an excellent explanation!

1 Like

Models really like inputs that have a mean 0 and a std of 1.

1 Like

I think there was a short discussion in last year’s course. I don’t remember of the top of my head what that heuristic was, but maybe check that lesson?

IIRC a gentleman out of South Africa (I need to find the paper) for his thesis explored the fastai tabular model. He found that 3 layers were ideal. I don’t quite recall the sizes, let me try to find it

2 Likes

If decision trees and random forests at the core are basically if-else statements then what’s the reason we normalize the data at the start? I don’t understand why this might help as this isn’t a distance based model like linear regression etc…

Yes. It worked for me in the past.

Don’t think so. Weight decay is actually better as you keep the inner complexity of the model, due to all its non-linearities and let the learning process figure it out for you.

It does not help at all for Random Forest.
Tree Ensembles are scale invariant.

It does help in case you are training a NN, in terms of learning stability.

2 Likes

You are correct. Standardizing the data (i.e. centering to zero the mean and re-scaling to get unit variance) is not necessary for tree-based models.

The reason is that decisions (splits) are based on inequalities, which compare relative feature values and are agnostic to scale and mean.

2 Likes

Found it: Deep learning for tabular data: an exploratory study by Jan-Andre Marais. Published in 2019. He went into great detail trying out various hyperparameters and techniques with the fastai model

Fair warning it’s a hefty read (144 pages!). The bits we care about are from pages 62 to 85. Everything from attention to self normalization to unsupervised pretraining were attempted or discussed :slight_smile:

13 Likes

What are the other fancy methods one can use to ensemble a rf model and a nn tabular model other than averaging the results?

Can you talk a little bit about the COVID-19 efforts at the end if possible? Would love to know what is the latest.

1 Like

Here is the questionnaire solutions if you want to check your answers or are struggling (work in progress :slight_smile: ):

Jeremy’s mic is dropping very infrequently.
Specifically about what to expect next week?

4 Likes

Question: Could we use the boosting technique with a neural net?

jeremy’s audio is now cutting out … choppy…

Thanks Rachel & Jeremy!

Next week NLP? :slight_smile:

9 Likes

Since Entity Embeddings was covered extensively in today’s class, sharing my Medium article with a link to my talk on the same at NY ML community last year.

2 Likes

You’ve been waiting for 7 whole weeks now! :slight_smile:

2 Likes