Updated Feature Importance for Tabular?

mindtrinket · March 24, 2019, 12:49pm

What is everyone preferred method for doing EDA for tabular with the updated library? Do you still use random forest and then move over to a NN? Or are there functions that I have overlooked?

During the machine learning courses, Jeremy discusses feature importance for the data and provides the famous bulldozer notebook. When going back through I noticed many of the functions are dependent on rf_feat_importance and with fast.ai now using tabular, the functionality broke.

After researching some hours on the forums, I saw a question for the Jeremy AMA, but it didn’t appear to get an answer. I also found Measuring Feature Importance in NNets for structured data however, it uses the old structured data format.

karthik.subraveti · March 24, 2019, 1:37pm

I think the tabular class has pretty much most of the functionality including normalization and which was done in separate functions calls such as proc_df, train_cats wrapped up. However i agree the compatibility with old API is broken for now. new API is all NN first.

github.com

karthiksubraveti/ml_notebooks/blob/master/msft_rf.py

#!/usr/bin/env python
# coding: utf-8

# In[ ]:


from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from fastai.tabular import *


# In[ ]:


PATH = 'data/msft/'


# In[ ]:


cols = ['SmartScreen',

This file has been truncated. show original

This is a sample of how i used the new API. However i didn’t try getting the feature importance.

mnpinto · March 25, 2019, 12:17pm

The idea to compute the feature importance is simple, after the model is trained we generate the predictions and compute the loss (let’s call it reference loss L_{ref}). Then we shuffle one column at the time and again compute the loss for each feature (L_i , where i is for each columns, categorical or continuous).

Then after having all the losses we find how shuffling column i affects the performance, how much the loss increases (assuming lower is better). If the loss increases by a large amount then the feature is very importance. If the change in loss is very small that feature is not important.

So the relative feature importance (F_i) is given by F_i = L_{ref} - L_i for each feature i.

You can try to set up an example, train the model and then iterate over the validation dataloader and compute all the losses. Maybe we can compute the feature importance over each mini-batch and return the list of all feature importances. Then we can plot feature importance with error bars; like using the mean and 2 times the standard deviation to get the 95% confidence interval.

mindtrinket · March 25, 2019, 1:57pm

That makes perfect sense and great explanation! Which is why I would think we would want to do it with the model we are using instead something different. But perhaps the differences don’t matter much.

Have you updated your code? Otherwise, this might be my mini-project this week.

mnpinto · March 25, 2019, 2:20pm

I haven’t had time to update my code. It sounds a good mini-project indeed

mindtrinket · March 31, 2019, 12:58am

Hit the first hiccup. Not sure what I should change m.crit to since I can’t find documentation. It looks like it might have been updated to loss_func?

Original Code:
loss0 = np.array([to_np(m.crit(m.model(x_cat, x_cont), y)) for x_cat, x_cont, y in iter(md.val_dl)]).mean()

Altered:
loss0= np.array([to_np(learn.crit(learn.model(x_cat, x_cont), y)) for x_cont, y in iter(data.valid_dl)]).mean()

Easy conversions:

m becomes learn
md becomes data
val_dl becomes valid_dl

mnpinto · March 31, 2019, 10:19am

Yes you are right, m.crit should be updated to learn.loss_func Although we could also use some metric function like accuracy instead of the loss function. In that case we could then display feature importance as a change in accuracy or whatever the metric we are using.

whamp · April 5, 2019, 1:49am

check out shap values implemented here:

They can apply to any type of model and have numerous examples to learn from in the repo. They are also really fast to calculate compared with drop_Col importance and permutation importance. If you’re not familiar with those, they are implemented here: