Lesson 6 official topic

amir8 · June 21, 2022, 8:38am

Is it true that Random Forest model does not get overfitted?

arunslb123 · June 21, 2022, 8:39am

Also for data, dropout in deep learning is very similar to bagging in random forest.

meanders · June 21, 2022, 8:40am

Would you ever exclude a tree from the forest if had a ‘bad’ OOB error?

amir8 · June 21, 2022, 8:44am

In terms of ML explainability, feature importance of RF model sometimes has different results when you compare it to other explainability techniques like well-known SHAP method or LIME. In this situation, which one would be more accurate/reliable? RF feature importance or other techniques?

Zakia · June 21, 2022, 8:46am

Just some comments or thoughts (and questions) please:

We could go on and create ensembles (and more) of bagged models, and I assume they would result in better performing models - so also, question here: when should we stop?
Another question regarding ensembles: if we’d like to create ensembles of ensembles, then are bagged models combined with bagged models a better approach, than, say: creating an ensemble of bagged model with a different ensemble technique, like stacking?

Moody · June 21, 2022, 8:52am

Yes, later this year. check out Walkthru 13 and Discord discussion.

JerynC · June 21, 2022, 8:57am

How does random forest compare to bootstrapping?

rjohnson · June 21, 2022, 9:00am

“Statistical Modeling: The Two Cultures” by Leo Breiman as mentioned by Jeremy in the lecture.

madhavajay · June 21, 2022, 9:01am

Is there any relationship between random forests and having an equivalent number of weights and relu activations in a similarly deep neural network? Can random forests just be implemented with a sufficient DL network?

Zakia · June 21, 2022, 9:09am

On the overfitting aspect:

If we use random forest to do feature importance to get the best columns in a dataset, and then use random forest also for creating the model, would that translate to overfitting?

kurianbenoy · June 21, 2022, 9:13am

When you are working on tabular data, how do you go on trying different models like random forests, gradient boosting, neural networks etc…? How is that decision being made, is there any benchmarks like which image models are best in tabular world also?

Tamori · June 21, 2022, 9:25am

Do you use AutoML frameworks to help improve your iterations in a more automated way and if yes which AutoML frameworks or services do you recommend?

gautam_e · June 21, 2022, 9:33am

If you’re using Linux or WSL autosklearn is a good library to try out. As the name suggests, it is closely related to/ based on sklearn, which you probably already have some familiarity with.

Zakia · June 21, 2022, 9:37am

Do you create different Kaggle notebooks for the different models you try? So one Kaggle notebook for the first (base) model… and separate notebooks for subsequent notebooks? OR, do you put your subsequent models in the bottom part of the same (the base model) notebook? Just wondering what’s your ideal approach?

Tamori · June 21, 2022, 9:38am

I see there’s a dedicated framework for PyTorch: GitHub - automl/Auto-PyTorch: Automatic architecture search and hyperparameter optimization for PyTorch

Tamori · June 21, 2022, 9:45am

Interesting answer regarding AutoML. I thought those frameworks didn’t necessarily exhaustively search the optimisation space and it was possible to do something similar to what Jeremy is saying, i.e. just try to tweak a few parameters at a time on a few simple models to get results more quickly.

vguerra · June 21, 2022, 9:57am

when I use tta during my training process, do I need to do something special during inference? It seems to me this is something you use only during validation, right?

kurianbenoy · June 21, 2022, 9:57am

What is the reason behind models taking even images of models in 224x224 square and also rectangular shape ?

kurianbenoy · June 21, 2022, 9:58am

Yes, TTA is to be done only during inference

miwojc · June 21, 2022, 12:03pm

gpu need all imaes in the same size to run them in parallel, they could be square, rectangle format as long as all of them are the same size.