Computing Feature Importance from Scratch

Hi, I am having trouble with normalizing the feature importance.

Here is the difference in RMSE between the “shuffled” column and the “original” column:

After normalizing it from 0 to 1, I get:

where fi (bottom) is the feature importance from sklearn and fi2 (top) is the one I did myself. It’s somewhat similar but the numbers are a bit off.

Any pointers on what I could be doing wrong? My method of normalizing right now is just dividing by the sum of the feature importances.


Maybe use R^2 not RMSE?

It depends on the feature importance implementation in scikit, what metric they are using and how they are calculating it. In general, I would probably focus on my metric of interest (business or Kaggle) then calculate validation differences for each shuffled feature. So here there is no perfect solution, but this approach at the same time makes it very flexible and powerful. It is metric and model agnostic :slight_smile:

So, as long as you are confident about what you measure I wouldn’t worry much about getting it “right” :wink:

1 Like

For rf classifier for example: