Lesson 6 official topic

Is it true that Random Forest model does not get overfitted?

2 Likes

Also for data, dropout in deep learning is very similar to bagging in random forest.

1 Like

Would you ever exclude a tree from the forest if had a ‘bad’ OOB error?

2 Likes

In terms of ML explainability, feature importance of RF model sometimes has different results when you compare it to other explainability techniques like well-known SHAP method or LIME. In this situation, which one would be more accurate/reliable? RF feature importance or other techniques?

1 Like

Just some comments or thoughts (and questions) please:

  • We could go on and create ensembles (and more) of bagged models, and I assume they would result in better performing models - so also, question here: when should we stop?

  • Another question regarding ensembles: if we’d like to create ensembles of ensembles, then are bagged models combined with bagged models a better approach, than, say: creating an ensemble of bagged model with a different ensemble technique, like stacking?

Yes, later this year. check out Walkthru 13 and Discord discussion.

4 Likes

How does random forest compare to bootstrapping?

“Statistical Modeling: The Two Cultures” by Leo Breiman as mentioned by Jeremy in the lecture.

6 Likes

Is there any relationship between random forests and having an equivalent number of weights and relu activations in a similarly deep neural network? Can random forests just be implemented with a sufficient DL network?

1 Like

On the overfitting aspect:

If we use random forest to do feature importance to get the best columns in a dataset, and then use random forest also for creating the model, would that translate to overfitting?

2 Likes

When you are working on tabular data, how do you go on trying different models like random forests, gradient boosting, neural networks etc…? How is that decision being made, is there any benchmarks like which image models are best in tabular world also?

1 Like

Do you use AutoML frameworks to help improve your iterations in a more automated way and if yes which AutoML frameworks or services do you recommend?

1 Like

If you’re using Linux or WSL autosklearn is a good library to try out. As the name suggests, it is closely related to/ based on sklearn, which you probably already have some familiarity with.

1 Like

Do you create different Kaggle notebooks for the different models you try? So one Kaggle notebook for the first (base) model… and separate notebooks for subsequent notebooks? OR, do you put your subsequent models in the bottom part of the same (the base model) notebook? Just wondering what’s your ideal approach?

3 Likes

I see there’s a dedicated framework for PyTorch: GitHub - automl/Auto-PyTorch: Automatic architecture search and hyperparameter optimization for PyTorch

1 Like

Interesting answer regarding AutoML. I thought those frameworks didn’t necessarily exhaustively search the optimisation space and it was possible to do something similar to what Jeremy is saying, i.e. just try to tweak a few parameters at a time on a few simple models to get results more quickly.

1 Like

when I use tta during my training process, do I need to do something special during inference? It seems to me this is something you use only during validation, right?

2 Likes

What is the reason behind models taking even images of models in 224x224 square and also rectangular shape ?

1 Like

Yes, TTA is to be done only during inference

3 Likes

gpu need all imaes in the same size to run them in parallel, they could be square, rectangle format as long as all of them are the same size.

4 Likes