# Feature Importance using tree Interpreter

Hi @jeremy, Based on our discussion in class on tree interpreter, I tried to calculate the importance of each feature by taking the absolute sum of contributions for 4000 data points in the validation set. And I found that the results were quite similar to what were getting using rf_feature_importance function. And I think that this could also be an indicator of feature importance, would like to dig deeper into this.

Here is my code

``````from treeinterpreter import treeinterpreter as ti
def imp_ti(m,x):
imp = []
for i in range(x.shape):
prediction, bias, contributions = ti.predict(m, x.values[None,i])
imp.append(contributions)
return np.sum(np.abs(imp), axis=0)
``````
``````def imp_permutation(m,x,y):
error = []
for col in x.columns:
xnew = x
xnew[col] = np.random.permutation(xnew[col])
error.append(rmse(m.predict(xnew),y))
base_error = rmse(m.predict(x),y)
error = [item-base_error for item in error]
return error
``````

Top 10 predictors (Permutation, Rf_feature_importance, Tree_Interpreter)

Nice job! Yes this is a good method, and in fact R provides both this and the permutation approach. More info here https://stats.stackexchange.com/questions/12605/measures-of-variable-importance-in-random-forests

Hi @alvira , I like your idea. But I still haven’t thought through why you take the absolute value of contribution as a proxy of importance. Do you mind explain your rationale behind? Thanks!

1 Like

@lizchenym Thanks! My understanding is that the magnitude of contribution is important and while taking the sum of contribution for all rows, I didnt want to nullify positive values with negatives, hence the absolute (we can replace it with square function also using the same logic).
@jeremy please correct me if I am wrong. Thanks!

1 Like

Seems right to me!