So we know that bagging is powerful as an ensemble approach to machine learning. Would it be advisable to try out bagging then, first, when approaching a particular task (say, tabular task) before deep learning?
Can we create a bagging model, which includes fast.ai deep learning model(s)? I guess it will be really powerful?
In terms of ML explainability, feature importance of RF model sometimes has different results when you compare it to other explainability techniques like well-known SHAP method or LIME. In this situation, which one would be more accurate/reliable? RF feature importance or other techniques?
Just some comments or thoughts (and questions) please:
We could go on and create ensembles (and more) of bagged models, and I assume they would result in better performing models - so also, question here: when should we stop?
Another question regarding ensembles: if we’d like to create ensembles of ensembles, then are bagged models combined with bagged models a better approach, than, say: creating an ensemble of bagged model with a different ensemble technique, like stacking?
Is there any relationship between random forests and having an equivalent number of weights and relu activations in a similarly deep neural network? Can random forests just be implemented with a sufficient DL network?
If we use random forest to do feature importance to get the best columns in a dataset, and then use random forest also for creating the model, would that translate to overfitting?
When you are working on tabular data, how do you go on trying different models like random forests, gradient boosting, neural networks etc…? How is that decision being made, is there any benchmarks like which image models are best in tabular world also?
If you’re using Linux or WSL autosklearn is a good library to try out. As the name suggests, it is closely related to/ based on sklearn, which you probably already have some familiarity with.
Do you create different Kaggle notebooks for the different models you try? So one Kaggle notebook for the first (base) model… and separate notebooks for subsequent notebooks? OR, do you put your subsequent models in the bottom part of the same (the base model) notebook? Just wondering what’s your ideal approach?