Hi all, I was going through 09_tabular.ipynb. There I want to use neural network embeddings in a Random Forest. I tried various ways but was unable to get embeddings from a NN.
The NN was having layers as [500, 250, 20], and was trying to compute 20-dimensional embeddings.
I tried getting embediings on a simple NN with only 1 cont and 0 cat variable by: embs = learn.model.layers[0:2](dls.train.one_batch())
(0:2 because want to get output from 2nd layer)
The issue with computing embeddings is that there are both cont and cat variables, and so dls.train.one_batch() gives 3 tensors, rather than 2. I tried concatenating them but that also didn’t work. I tried a lot of ways but got stuck anyways
WIll be very thankful if someone could guide or rather share his code snippet.
Oh yes, so in the chapter it’s talking about using ‘categorical embeddings’ trained from a nn, rather than a raw categorical column. I was thinking that if say nn has layers as [500,250,10] then we want to use final layer 10-dim output as an extra feature, along with raw categorical columns. My bad, Thanks
Thank you so much, everything clear. My bad, I pushed TabularPandas to it. A strange mistake, in a bunch of code I was sure that these were tensors.
I have tried to get one big Emmbering matrix (412_698 rows in my case - all training data) for all categorical and continuous columns and use it as a training dataset for the Random Forest model.
It remains to figure out how to get everything at once (not just tensors for cat and cont for one_butch() which is limited by batch size - n_rows, for all data at once).