Embedding for continuous variables?

meetnaren · June 20, 2018, 3:10pm

Why do we have embeddings for categorical variables only? Can we also have embeddings for continuous variables? Has anyone tried this?

tenoke · June 20, 2018, 7:21pm

You can treat continuous variables as categorical and use embeddings for them - you see this at multiple points in the fast.ai courses, where we treat dates, for example years, as categorical and use an embedding - e.g. 2018 is one category, 2017 is another and so on.

It generally only makes sense in situations like the above where you either have a limited number of values and/or can seperate the different values in their own clusters - e.g. for temperature you can bin 1-25 degrees as category 1, 25-50 as category 2… etc. and then use an embedding.

However, in cases where you can get a lot of different values, and the difference between them carries information you dont want to lose - e.g. when you might have $13.50 or $24,000,000 you would likely not get (enough) examples of all the different options (which are theoretically infinite) to even create embeddings for every single one. The model in general might be better off learning the simpler linear relationship (e.g. more money = more purchasing power) with some cut-offs, rather than treating e.g. $13 and $13.50 as completely different (before seeing enough examples of both).