XGBoost and embeddings

renato · July 20, 2017, 5:42pm

I’m trying to train something similar to what it’s shown on lesson 14 (rossman), and I did train a network to find all the embeddings. Now I want to view the feature importance of each variable using xgboost, but on the rossman notebook I think the embeddings are not being used to train with xgboost; so to get the feature importance of the embeddings I’m thinking about sum the fscore of each column for each variable.

Eg: if storeID have an embedding of size 5, i will have the columns storeID_emb1, storeID_emb2, …, storeID_emb5. After the training i get an fscore for each column and then I sum them up:
storeID_fs = storeID_fs1 + storeID_fs2 + … + storeID_fs5

What do you guys think of this approach?