@Pak running it now, I wasnβt able to get to it yesterday. However I did run it today. My results differed from yours a bit. Here is my new function, where we take in a test dataframe and shuffle each column one at a time and validate over it:
import copy
def feature_importance(learn:Learner, cats:list, conts:list, dep_var:str, test:DataFrame):
data = learn.data.train_ds.x
procs = data.procs
cat, cont = copy.deepcopy(cats), copy.deepcopy(conts)
if 'CrossEntropyLoss' in str(learn.loss_func):
dt = (TabularList.from_df(test, path='', cat_names=cat, cont_names=cont,
procs=procs)
.split_none()
.label_from_df(cols=dep_var)
.databunch(bs=learn.data.batch_size))
else:
dt = (TabularList.from_df(test, path='', cat_names=cat, cont_names=cont,
procs=procs)
.split_none()
.label_from_df(cols=dep_var, label_cls=FloatList, log=True)
.databunch(bs=learn.data.batch_size))
learn.data.valid_dl = dt.train_dl
loss0 = float(learn.validate()[1])
fi=dict()
cat, cont = copy.deepcopy(cats), copy.deepcopy(conts)
types = [cat, cont]
for j, t in enumerate(types):
for i, c in enumerate(t):
print(c)
base = test.copy()
base[c] = base[c].sample(n=len(base), replace=True).reset_index(drop=True)
cat, cont = copy.deepcopy(cats), copy.deepcopy(conts)
if 'CrossEntropyLoss' in str(learn.loss_func):
dt = (TabularList.from_df(base, path='', cat_names=cat, cont_names=cont,
procs=procs)
.split_none()
.label_from_df(cols=dep_var)
.databunch(bs=learn.data.batch_size))
else:
dt = (TabularList.from_df(test, path='', cat_names=cat, cont_names=cont,
procs=procs)
.split_none()
.label_from_df(cols=dep_var, label_cls=FloatList, log=True)
.databunch(bs=learn.data.batch_size))
learn.data.valid_dl = dt.train_dl
fi[c] = float(learn.validate()[1]) - loss0
d = sorted(fi.items(), key =lambda kv: kv[1], reverse=True)
df = pd.DataFrame({'Variable': [l for l, v in d], 'Accuracy': [v for l, v in d]})
df['Type'] = ''
for x in range(len(df)):
if df['Variable'].iloc[x] in cats:
df['Type'].iloc[x] = 'categorical'
if df['Variable'].iloc[x] in conts:
df['Type'].iloc[x] = 'continuous'
return df
This allows for a very standard approach to the two default loss functions Fast.AI will use. My results were different than yours though. Anything negative was a negative impact on the training, so they were the best.
ββββββββββ¦βββββββββββββββββββββββββββ¦ββββββββββββ¦ββββββββββββββ
β Number β Variable β Accuracy β Type β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 0 β SchoolHoliday β 0.001581 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 1 β trend β 0.001569 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 2 β AfterStateHoliday β 0.001444 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 3 β Month β 0.001159 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 4 β StateHoliday_bw β 0.001103 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 5 β trend_DE β 0.001090 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 6 β Min_Humidity β 0.001085 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 7 β Max_Wind_SpeedKm_h β 0.000958 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 8 β Max_TemperatureC β 0.000871 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 9 β StateHoliday β 0.000795 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 10 β Min_TemperatureC β 0.000791 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 11 β Events β 0.000748 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 12 β PromoInterval β 0.000531 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 13 β Promo2Weeks β 0.000477 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 14 β StoreType β 0.000465 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 15 β Promo2SinceYear β 0.000420 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 16 β Store β 0.000397 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 17 β Year β 0.000392 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 18 β CompetitionMonthsOpen β 0.000334 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 19 β BeforeStateHoliday β 0.000255 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 20 β State β 0.000107 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 21 β Assortment β -0.000095 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 22 β Day β -0.000122 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 23 β Promo_bw β -0.000333 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 24 β CloudCover β -0.000406 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 25 β Mean_TemperatureC β -0.000516 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 26 β Promo β -0.001300 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 27 β SchoolHoliday_bw β -0.001309 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 28 β Mean_Humidity β -0.001415 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 29 β SchoolHoliday_fw β -0.001569 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 30 β StateHoliday_fw β -0.001817 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 31 β Week β -0.004419 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 32 β DayOfWeek β -0.008283 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 33 β Max_Humidity β -0.008312 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 34 β CompetitionDistance β -0.008432 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 35 β CompetitionOpenSinceYear β -0.008464 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 36 β Mean_Wind_SpeedKm_h β -0.008909 β continuous β
ββββββββββ©βββββββββββββββββββββββββββ©ββββββββββββ©ββββββββββββββ
Store wound up being somewhere in the middle here, so perhaps I am doing something wrong?
Here are the results given the old function from earlier posts:
ββββββββββ¦βββββββββββββββββββββββββββ¦ββββββββββββ¦ββββββββββββββ
β Number β Variable β Accuracy β Type β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 0 β Mean_Wind_SpeedKm_h β 0.000946 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 1 β Promo_bw β 0.000924 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 2 β Promo β 0.000844 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 3 β Store β 0.000747 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 4 β SchoolHoliday_fw β 0.000728 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 5 β Promo2SinceYear β 0.000717 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 6 β Assortment β 0.000653 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 7 β Promo_fw β 0.000611 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 8 β StateHoliday_fw β 0.000428 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 9 β Day β 0.000400 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 10 β Max_Wind_SpeedKm_h β 0.000358 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 11 β CompetitionDistance_na β 0.000294 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 12 β Month β 0.000185 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 13 β trend_DE β 0.000050 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 14 β BeforeStateHoliday β 0.000014 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 15 β SchoolHoliday β -0.000037 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 16 β CompetitionMonthsOpen β -0.000058 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 17 β StateHoliday β -0.000058 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 18 β Max_Humidity β -0.000077 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 19 β Mean_TemperatureC β -0.000136 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 20 β StateHoliday_bw β -0.000148 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 21 β StoreType β -0.000163 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 22 β Mean_Humidity β -0.000193 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 23 β SchoolHoliday_bw β -0.000246 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 24 β DayOfWeek β -0.000286 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 25 β trend β -0.000390 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 26 β Promo2Weeks β -0.000517 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 27 β Min_Humidity β -0.000906 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 28 β PromoInterval β -0.000937 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 29 β Min_TemperatureC β -0.001001 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 30 β CompetitionOpenSinceYear β -0.001064 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 31 β AfterStateHoliday β -0.001515 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 32 β CloudCover β -0.001570 β continuous β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 33 β State β -0.002007 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 34 β Events β -0.002613 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 35 β Year β -0.003186 β categorical β
β βββββββββ¬βββββββββββββββββββββββββββ¬ββββββββββββ¬ββββββββββββββ£
β 36 β CompetitionDistance β -0.005161 β continuous β
ββββββββββ©βββββββββββββββββββββββββββ©ββββββββββββ©ββββββββββββββ
Both of these are under the guise where importance is calculated by shuffled_accuracy - baseline_accuracy.
Let me know your thoughts.