How to build embeddings for the categorical variables, if one observation can belong to several categories at the same time?

Let’s imagine we have a history of browsing behavior. We have a user_1 that visited page_1, page_3, and page_5. And user_2, that visited page_2. Then the data frame would look like this:

user_1 : [1; 0; 1; 0; 1]
user_2 : [0; 1; 0; 0; 0]

{where we have 5 pages in total, 1 indicates that the page was visited}

It is not a one-hot encoding, therefore I am not sure how to build the embeddings.
Just in case, at the end I need to do multiclass classification and it seemed to me that embeddings can be quite handy for sparsity reduction and they also could improve the model.

I am new in embeddings and it would be very helpful for me if someone can provide me some hints, or point out to posts about it. :slightly_smiling_face:

Unless you have some other sub-features for each “page” here, I would think each of them get their own embedding.

However your question title doesn’t match up with the content of your question i.m.o.
If you have categorical variables for each page, you would have to represent a page as those variables by creating them as separate embedding spaces.

Ex: inputs for a model with embedding space for both page_category and page_author would look something like this:
page_1 = [page_category:1, page_author:4]
page_2 = [2,4]
page_3 = [1,3] …

user_1: [ [1,4], [0, 0], [1,3] …]

then you need to concat or treat these two embeddings (category and author) into one “page representation” (by passing it into a fully connected layer for example)

So, in summary, you just have several one-hot encoded inputs instead of one. Since you have several categorical variables.