While calculating feature importances for tree based ensemble models, should we use the train set or validation set for calculating feature importances and why?
In eli5s docs they use test set to calculate permutation feature importances.
While calculating feature importances for tree based ensemble models, should we use the train set or validation set for calculating feature importances and why?
In eli5s docs they use test set to calculate permutation feature importances.
You want to test this on unseen data. What is of interest is which feature is important for generalizing to unseen data, not which feature allows you to fit the data you train on.
Makes sense. Thank you radek.
The way the scikit learn have implemented the Random Forest feature importance, I assume that they have used training sets. Please correct me if I am wrong.
It would be worth reading the explained.ai article on this too as there are some pitfalls with the default feature importance functions in scikit.