Computing Feature Importance from Scratch

chrisdong · December 10, 2017, 8:42am

Hi, I am having trouble with normalizing the feature importance.

Here is the difference in RMSE between the “shuffled” column and the “original” column:

After normalizing it from 0 to 1, I get:

where fi (bottom) is the feature importance from sklearn and fi2 (top) is the one I did myself. It’s somewhat similar but the numbers are a bit off.

Any pointers on what I could be doing wrong? My method of normalizing right now is just dividing by the sum of the feature importances.

Thanks!

parrt · December 10, 2017, 4:59pm

Maybe use R^2 not RMSE?

kcturgutlu · December 11, 2017, 2:44am

It depends on the feature importance implementation in scikit, what metric they are using and how they are calculating it. In general, I would probably focus on my metric of interest (business or Kaggle) then calculate validation differences for each shuffled feature. So here there is no perfect solution, but this approach at the same time makes it very flexible and powerful. It is metric and model agnostic

So, as long as you are confident about what you measure I wouldn’t worry much about getting it “right”

kcturgutlu · December 11, 2017, 2:46am

For rf classifier for example: