Hi @jeremy, Based on our discussion in class on tree interpreter, I tried to calculate the importance of each feature by taking the absolute sum of contributions for 4000 data points in the validation set. And I found that the results were quite similar to what were getting using rf_feature_importance function. And I think that this could also be an indicator of feature importance, would like to dig deeper into this.
Here is my code
from treeinterpreter import treeinterpreter as ti
def imp_ti(m,x):
imp = []
for i in range(x.shape[0]):
prediction, bias, contributions = ti.predict(m, x.values[None,i])
imp.append(contributions)
return np.sum(np.abs(imp), axis=0)
def imp_permutation(m,x,y):
error = []
for col in x.columns:
xnew = x
xnew[col] = np.random.permutation(xnew[col])
error.append(rmse(m.predict(xnew),y))
base_error = rmse(m.predict(x),y)
error = [item-base_error for item in error]
return error
Top 10 predictors (Permutation, Rf_feature_importance, Tree_Interpreter)