Hi there
Wanted to make sure that my approach in deriving and creating the embedding tables is on the right track.
to get the embeddings I doing as below:
embs = list()
for param in learn.model.embs.parameters():
#print(param)
embs.append(param)
then lets say take the embeddings for “sex”
sex_e = pd.DataFrame(to_np(embs[0]))
take the levels for the factor var
sex_names = pd.DataFrame(joined_samp.sex.unique(),columns=["sex"])
and lastly do the assignment
sex_emb = pd.concat([sex_names,sex_e], axis=1)
The above script assumes that the rows in the embedding table correspond to the gender levels as appeared in the data. In this case we have df.sex = [2,1,1,1,2,0,2,0,1,2,1,3]
so the sex_names = [2,1,0,3]
the ombedding table is
0 1
0 -0.413479 -0.037152
1 -0.323445 0.062252
2 0.220362 0.248682
3 0.200637 0.368422
so after the merging we have a df as:
sex 0 1
0 2 -0.413479 -0.037152
1 1 -0.323445 0.062252
2 0 0.220362 0.248682
3 3 0.200637 0.368422
Is this notion correct. I am concern as there is no way (well on my knowledge at least) to confirm the assignment.
Any help would be greatly appreciated.