Permutation Feature Importances

vineeth_k · June 19, 2018, 2:30pm

While calculating feature importances for tree based ensemble models, should we use the train set or validation set for calculating feature importances and why?

In eli5s docs they use test set to calculate permutation feature importances.

Eli5 Permutation Feature importance docs

radek · June 19, 2018, 3:52pm

You want to test this on unseen data. What is of interest is which feature is important for generalizing to unseen data, not which feature allows you to fit the data you train on.

vineeth_k · June 19, 2018, 5:55pm

Makes sense. Thank you radek.

anubhavmaity · December 12, 2018, 4:32am

The way the scikit learn have implemented the Random Forest feature importance, I assume that they have used training sets. Please correct me if I am wrong.

AnthonyHolmes · January 11, 2019, 10:23am

It would be worth reading the explained.ai article on this too as there are some pitfalls with the default feature importance functions in scikit.