Embedding size for multiple predictions in Tabular regression

jofrisch · April 29, 2020, 12:46pm

Hello!

I was wondering whether the default embedding sizes should be increased when we run multiple regressions in a FCNN.

In my usecase, I have a tabular dataset containing features and 10 continuous dependent variables that my NN aims to estimate. For some categorical features, the default embedding size is 10.
I am afraid that if I don’t increase the size, the embeddings are simply storing in each factor a value representing the average of each dependent variable.

Anyone faced a similar situation before?

muellerzr · April 29, 2020, 12:50pm

You can always try increasing it by passing in a custom embedding size. It’s as simple as using a dictionary and passing it into tabular_learner’s emb_szs parameter. Something like:

{‘var1’:15}

jofrisch · April 29, 2020, 1:14pm

Oh yes, thanks, this is actually what I do so far.

But this was more a theoretical question. I was wondering if someone has some experience dealing with embeddings for multiple regressions. To my intuition, it seems reasonable to increase the size, but I haven’t seen anyone discussing this, and as it is not implemented in the library, there might be a reason.