I think # of leaves are determined by # trees and maximum depth of each tree, both of which are hyper-parameters to be tuned. Selecting when nodes are split is done by minimizing GINI (or maximizing information gain), and the leaf is also an instance of a node (just the last node in that tree i think).
what is the best way to identify bias in tabular data , and any suggested de-biasing techniques
They do.
XGBoost has this functionality.
CatBoost is my favourite hands down though. It features very advanced tools around the issue you mention.
I did and didnāt workā¦ I am also looking at the link suggested by @hiromi. Thank you both.
Has anyone applied bagging/boosting to neural nets with good results?
When we do K-fold cross validation and use the K-fold ensemble for prediction, is that a form of bagging?
is the concept of creating a minibatch in a DNN & tranining an epoch on them, analogous to creating a bagging based model ā where random samples are created and trained ?
Wouldnāt selecting random data end up selecting all data for training and therefore overfit on a Random Forest?
Again around CatBoost, this talk from Anna Veronika Dorogush (lead engineer at Yandex) is quite outstanding and introduces all the goodies of the package (avoiding overfitting included )
Weird. Did work for me. Keep us posted.
https://arxiv.org/abs/2003.06505 See AutoGluon-Tabular by Amazon
Any similar resources for Lightgbm?
Not really, as you would not select all of it at the same time.
Also, you generally select random features in addition to random rows.
This emphasizes the fact that the single trees you are building are uncorrelated, hence driving the whole error to zero.
That makes sense. Clever!
Again I see similarity with K-fold CV. OOB error seems similar to out-of-fold error.
It is indeed!
Mario,
Look here:
Pay specific attention to the parts under API credentials. You need to open the terminal session in Jupyter and run thos export commands to create the env variables the Kaggle package requires.
Happy hunting!
what is the best method to understand and measure uncertainity in the predictions ?.
Thank you @JPKabā¦ and indeed happy hunting