Lesson 7 - Official topic

It does not help at all for Random Forest.
Tree Ensembles are scale invariant.

It does help in case you are training a NN, in terms of learning stability.

2 Likes

You are correct. Standardizing the data (i.e. centering to zero the mean and re-scaling to get unit variance) is not necessary for tree-based models.

The reason is that decisions (splits) are based on inequalities, which compare relative feature values and are agnostic to scale and mean.

2 Likes

Found it: Deep learning for tabular data: an exploratory study by Jan-Andre Marais. Published in 2019. He went into great detail trying out various hyperparameters and techniques with the fastai model

Fair warning it’s a hefty read (144 pages!). The bits we care about are from pages 62 to 85. Everything from attention to self normalization to unsupervised pretraining were attempted or discussed :slight_smile:

13 Likes

What are the other fancy methods one can use to ensemble a rf model and a nn tabular model other than averaging the results?

Can you talk a little bit about the COVID-19 efforts at the end if possible? Would love to know what is the latest.

1 Like

Here is the questionnaire solutions if you want to check your answers or are struggling (work in progress :slight_smile: ):

Jeremy’s mic is dropping very infrequently.
Specifically about what to expect next week?

4 Likes

Question: Could we use the boosting technique with a neural net?

jeremy’s audio is now cutting out … choppy…

Thanks Rachel & Jeremy!

Next week NLP? :slight_smile:

9 Likes

Since Entity Embeddings was covered extensively in today’s class, sharing my Medium article with a link to my talk on the same at NY ML community last year.

2 Likes

You’ve been waiting for 7 whole weeks now! :slight_smile:

2 Likes

Jeremy’s final words during today’s course: Next week NLP & Computer Vision

4 Likes

Thanks Jeremy and Rachel

1 Like

Thanks for another great lecture! Interesting to learn more about some “traditional” ML techniques like random forests!

4 Likes

Hahaha - yes I have :slight_smile:

Excited it’s coming next week :slight_smile:

1 Like

Thanks Jeremy, Rachel & Sylvain.

1 Like

Can someone please point to what entity embedding mean from the last bit of the lesson? Is it the one hot encoding that we discussed or something else?

Thanks, see you all next week!

1 Like

The enhancements that make Random Forest such a powerful Decision Tree model are:

  1. Bootstrap sampling, which is selecting a random subset of the data (i.e. rows in the data table) to construct each decision tree, and
  2. Selecting a random subset of the features (i.e columns in the data table) to make a ‘split’ at each ‘node’ in a decision tree.
  3. Ensembling, which is contructing a group of models (in this case, a ‘forest’ of trees) and averaging their votes to make each final classification.

The first two of these enhancements are analogous in their application and their effect to the Dropout technique in Neural Networks.

1 Like