Since movie genre embeddings are multi-hot not 1-hot, use matrix mult not lookups?

(Geoffrey) #1

Let’s say we intend to incorporate “Genre” as a category. It’s not a one-choice category however. Can fastai correctly process multi-hot embeddings inputs? Do we have to abandon fastai’s embeddings facilities to make embeddings in this case?

Keep in mind that a single movie can have multiple genre at the same time, such as comedy and romance and action, which is 3 genres at once. Or maybe just horror, which is just one genre. So a fastai subsystem assuming 1-hot encoding of input for the genre category is incorrect in this application. A multi-hot is required for genre, and this situation perhaps would present no problems theoretically for the embeddings at training time.

However you cannot do only one lookup, rather you have to do multiple lookups, to find the potentially many embeddings for one movie’s genre. If not lookups, then you could go back to matrix multiplication.

It’s still a problem however because then naturally you will have multiple embeddings vectors, not just one vector, to pass to the model. How would the model input these multiple embeddings vectors: Would they be concatenated, and how many would you concatenate? A fixed length input is generally expected for a neural network, and so this presents a problem of sorts when you don’t know in advance how many genres a movie has.

So to model the multiple genre(s) of a movie, what do you do? Create new columns, such that you get one binary column per genre value? That would be a bit sad because you’d not get all the nice embeddings benefits, at least not in the usual sense.

HOwever, as a solution I could imagine making an asymmetric neural network where you build in a new layer that all the binary genres connect to, but no other inputs connect to, and give it multiple layers to mimic a conventional embedding, e.g., if there are 10 different binary columns for “Genre” then you could have, say, 5 layers of 10 units each for the genre. It’s a standard dense layer arrangement and thus a standard matrix multiply to decode it.

I think this latter solution design would work effectively as an embedding but a nonstandard one at that.

What do you think?

(Theodoros Galanos) #2

Would NLP treatment help here?

I can imagine turning the genres to word vectors and then concatenating that to our embeddings?

(Sebastian Gonzalez Aseretto) #3

I’m also very interested in how this can be done. I found this paper where Maciej Kula (whose other jobs is pretty amazing) solved this problem, but I’m having a really hard time trying figuring out how to implemente this solution in Pytorch.