Lookup table for embeddings or embedding layer activation

bezova · November 2, 2018, 7:26pm

I want to use embeddings learned by structured model in tree base algorithm.
(based on lesson 4 - Rossmann example)

I can extract the matrices by to_np(m.model.embs[ix].weight.data) but I also need a lookup table used by learner to correspond each item in a category to the raw in the corresponding matrix.
( I checked that the matrices are in the same order as categories in data frame.)

In other words I need to calculate the embedding layer activation on given data.

(could not find it on the forum)

bezova · November 2, 2018, 8:06pm

I just realized the I do not understand the reason behind adding 1 to embedding categories.
cat_sz = [(c, len(cars[c].cat.categories)+1) for c in cat_vars]
The matrix longer then available categories. Does anyone knows why we do it?

davidagmac · April 5, 2019, 7:23pm

As far as I can tell the following attribute should give you that, it is a mapping from categorical feature names to a list of all the categories in that feature, which is used to look-up the embedding matrix:

learn.data.label_list.x.classes
(assuming your Learner object is called learn)

Regarding the number of rows in the embedding matrix equalling cardinality + 1, it appears that an extra row is added representing ‘#na#’ which presumably is a catch-all to handle the case where there is a new category found in the test data which wasn’t in the training data (and so wouldn’t have a row in the embedding matrix).

This even happens for the categorical columns created by FillMissing preprocessor where we end up with 3 rows in the embedding matrix corresponding to the following categories:

array(['#na#', False, True], dtype=object)

bezova · April 11, 2019, 5:42am

thanks, David, this is great way. I eventually approached it by using hook from lesson 6, 2018. may be found here: https://medium.com/@hiromi_suenaga/deep-learning-2-part-1-lesson-7-1b9503aff0c
Even simpler manual approach is to introduce global var by which the model class decides to return activations.
n+1 finally makes sense.