is there a way to extract the embedding layers/vectors from the final trained model when using StructuredLearner from column_data?
I have trained a net as in the lesson 4 - Rossmann example, but on a different dataset (used cars from Kaggle). Now I want to visualize the embeddings (after dimensionality reduction), in order to show for example which brands are close to each other etc.
The embeddings must be stored in model object in some way I assume, as is visible from learner_object.get_layer_groups(), but I haven’t found a way to get them.
Thank you very much! It was a mix between both suggestions. In my case
model.model.embs.parameters()
Do you by any chance also know, how to match each embedding vector to the corresponding category? The names do not seem to be included in the output of parameters()…
It will be in the same order as your cat_vars variable. If you want to check for yourself, first take a look at emb_szs, then do cars[cat_vars[0]].value_counts() and you should see that the number of unique values in that variable is equal to the first emb_szs (minus one).
Haven’t found an answer anywhere on the forum, so:
after getting embeddings from the model, how do I map them to values in categorical columns?
Assuming df is the dataframe I got from applying proc_df to train set(and df_test made from test set), do I just replace values there, which are all numerical now, say, 7 or 10 with embedding_matrix_of_the_column[7] or embedding_matrix_of_the_column[10] respectively?
I have a similar question regarding the embeddings. For each categorical variable, are the rows of the embeddings in the order 1, 2, 3, … to the last number?
Thank you for the reply. Just to be clear, if in your example I create an embedding matrix of dimension 2, and the tensor for ‘thing’ is
[ 0.2 0.4
0.1 0.1
0.7 0.2],
then the first row corresponds to Cake, the second to Fish, and the third to Mellon?