Predictions on pretrained entity embeddings

gkk · January 10, 2020, 1:43am

Hi,

I’m trying to use fast.ai’s training loop and all utilities on problem where I have pretrained embeddings (trained using StarSpace model) for some entities and labels that I want to predict. The input is a table with two columns: embedding vector | label

I’m wondering what’s the most natural way to fit such data into fast.ai. In particular, should I treat embedding as an input (essentially a single feature) to the model or should I incorporate embeddings directly into the model (and set them to be frozen) and have only embedding index as an input?

I tried feeding embedding vectors as inputs (stored in Pandas dataframe) but I ran into issues with numpy -> tensor conversion somewhere deep in the ItemList/Databunch code. I’m wondering if I’m climbing the wrong hill here.

gkk · January 23, 2020, 12:51am

Following up on this to share the solution I came up with. After trying both feeding the embeddings into the model directly or incorporating the embeddings table and passing embedding’s index as an input, I think the latter is vastly superior. You end up with smaller data set which is just idx, label and you don’t need to worry about threading embedding’s floats through the DataBunch/ItemList APIs.

If you stumble across this design choice, I recommend incorporating embeddings directly into the model and working with their indices as inputs to the model.