Assigning the factor levels to the embeddings tables


#1

Hi there

Wanted to make sure that my approach in deriving and creating the embedding tables is on the right track.

to get the embeddings I doing as below:

embs = list()
for param in learn.model.embs.parameters():
    #print(param)
    embs.append(param)

then lets say take the embeddings for “sex”

sex_e = pd.DataFrame(to_np(embs[0]))

take the levels for the factor var

sex_names = pd.DataFrame(joined_samp.sex.unique(),columns=["sex"])

and lastly do the assignment

sex_emb = pd.concat([sex_names,sex_e], axis=1)

The above script assumes that the rows in the embedding table correspond to the gender levels as appeared in the data. In this case we have df.sex = [2,1,1,1,2,0,2,0,1,2,1,3]
so the sex_names = [2,1,0,3]

the ombedding table is
0 1
0 -0.413479 -0.037152
1 -0.323445 0.062252
2 0.220362 0.248682
3 0.200637 0.368422

so after the merging we have a df as:

sex	0	1

0 2 -0.413479 -0.037152
1 1 -0.323445 0.062252
2 0 0.220362 0.248682
3 3 0.200637 0.368422

Is this notion correct. I am concern as there is no way (well on my knowledge at least) to confirm the assignment.

Any help would be greatly appreciated.