Permutation Feature Importances

(Vineeth Kanaparthi) #1

While calculating feature importances for tree based ensemble models, should we use the train set or validation set for calculating feature importances and why?

In eli5s docs they use test set to calculate permutation feature importances.

Eli5 Permutation Feature importance docs


You want to test this on unseen data. What is of interest is which feature is important for generalizing to unseen data, not which feature allows you to fit the data you train on.

(Vineeth Kanaparthi) #3

Makes sense. Thank you radek.

(Anubhav Maity) #4

The way the scikit learn have implemented the Random Forest feature importance, I assume that they have used training sets. Please correct me if I am wrong.

(Anthony Holmes) #5

It would be worth reading the article on this too as there are some pitfalls with the default feature importance functions in scikit.