Entity Embeddings

Hello! I am retrieving the embeddings generated for categorical variables for using as an input in another model. I know the order of embeddings is the same as that of the categorical variables that we specify, but I’m a little confused of the order of the values within each embedding.

For example, let’s say I have a DataFrame with a categorical variable (‘thing’) like the following:

	imp	thing
0	1	fish
1	2	cake
2	3	melon
3	7	melon

After converting it to categories, it gives me:
Categories (3, object): [cake < fish < melon]

Now, suppose I have the following embedding for the variable:
[0.2. 0.4
0.1 0.3
0.9. 0.7]

Is the order same as the order of the values, i.e. does this mean that [0.2 0.4] is the embedding for cake, [0.1 0.3] for fish, and [0.9 0.7] for melon? Or is the order according to the order of the values in the data frame, i.e. [0.2 0.4] for fish, [0.1 0.3] for cake and so on?

Thanks in advance!

The first row of your embedding matrix should align with category 0 of thing. I believe that would be cake, but to make sure, I would compare df.thing.cat.codes to df.thing and make sure that cake is encoded as 0, fish as 1, and melon as 2.

1 Like

Hi, may I know how to retrieve the embeddings generated for the categorical variables? Thanks.