Creating Readable Entity Encoding in Feat Importance

I have encoded my dataset using the Tabular module and training:

to_nn = TabularPandas(data, procs, cat, cont,
                      splits=splits, y_names='identified')
dls = to_nn.dataloaders(1024, device = device)
learn = tabular_learner(dls, layers=[500,250], n_out=1)
learn.fit_one_cycle(12, 3e-3)

and then looped to create new columns in OriginalColumn_n format:

for i,col in enumerate(learn.dls.cat_names[:5]):
    emb = learn.model.embeds[i]
    emb_data = emb(tensor(to_nn.train.xs[col], dtype=torch.int64))
    emb_names = [f'{col}_{j}' for j in range(emb_data.shape[1])]
    display(emb_names)

I am curious how (and should) I could best create a (figurative) dictionary to replace “j”, such that when I output feature importances, I get FRUIT_APPLE instead of FRUIT_0.

Of course, this begs the question – after embedding, does each tensor / emb_data ‘column’ still correspond to an initial value (APPLE), or is it just an n-dim encoding for all of FRUITS?

You can just use the feature importance module baked into fastinference maybe? And peek in the source code to see how I did it:

@muellerzr, Thank you for your response!

On importing

from fastinference.inference import *

to achieve utilizing fast inference’s feature_importance, I received a NameError for 'log_args" is not defined. I saw you posted in another forum regarding these issues. Perhaps it is because of my fastai import?

from fastai.tabular.all import * 

What is your version of fastai and fastcore

@muellerzr

FastAi = 2.5.2
Fastcore = 1.3.26

I am also happy to create the “dictionary” myself – I would just need to know if tensor column 0 is alphabetical, df.column.unique()[0], or some other measure for arrangement.