Entity embedding on multi-valued categorical attribute (Lesson 4 related)

kechan · April 25, 2019, 2:36am

In lesson 4 as well as many examples in other blogs, the categorical column or attribute is assumed to have only one value. This is mapped to a unique integer and a embedding layer is used to turn it into a feature vector.

But if you run into a categorical attribute that can take in multiple values, is there any guideline or “best practice” on how to apply Entity Embedding? For example, when you sign up on a news app/website, you can pick “topics” that interest you. This topic will be a categorical attribute that is multi-valued that could be “politics, sports, tech, etc”.

I imagine one can go back to use one-hot encoding (1 for chosen topic, 0 for all others). Again, this may not be good if the cardinality is very high. I try to think about how to use entity embedding sensibly in this scenario. A first thought is to just keep the same old embedding matrix and if > 1 values, just do vector addition. I haven’t tried yet to see if this is a bad idea or not.

Anyone knows any good suggestion, or references (papers, blogs, etc).