Can't wrap my head around one hot encoding vs embeddings

vishak · July 13, 2019, 10:45am

In Lesson 5, Jeremy talks about how he converted user and movie IDs to one hot encoded vectors and then multiplied it with the weight matrices.

I just missed the point of this. The one hot encoded matrix is just an identity matrix right? Multiplying it with the weight matrix just gives you the weight matrix again. What was the point of that?

And what is an embedding? Is it the action of looking up the weight matrix ( i.e is it a verb?)

Things have been going great and I’ve been able to wrap my mind around everything until now, but I’m unable to get the point of this whole exercise and I’ve watched it a few times.

Some help would be great! Thank You!

vishak · July 14, 2019, 10:25am

Hi Guys, no answers? Is the question too basic or am I communicating well enough? Help me out please!

DeepBlender · July 14, 2019, 10:32am

There is a comment below the YouTube video you might find useful:
ttps://www.youtube.com/watch?v=CJKnDu2dxOE&lc=UgwWaqWhuTUkEfssb9l4AaABAg
(Copy the the text and add an ‘h’ at the beginning to see the comment I am referring to as the first one. If I add the actual link, it unfortunately only embeds the video.)

25:21, after spending far too much time being a beginner at matrix multiplications I’d like to clarify to someone else who’s confused over why this works: It will only produce an output as seen if the one hot encoded matrix is multiplied to the weight matrix. See it as One-Hot-Matrix (dot) Weight-Matrix, the one hot matrix is to the left of the weight matrix (not as seen in the Excel document, the one-hot matrix is to the right). NOT Weight-Matrix (dot) One-Hot-Matrix, since this would not yield the result as seen. In fact, a 15x5 (dot) 209x15 matrix multiplication doens’t even work (which makes me feel sort of stupid for even trying to figure it out, in hindsight). Only a 209x15 (dot) 15x5 matrix multiplication will give this result due to the non continuity of matrix multiplications.

vishak · July 14, 2019, 11:56am

Thanks for the reply. I understand the mechanics of the multiplication. Maybe Q is this

In Scenario one Jeremy did a dot between a user and movie embedding matrices to get the output matrix

In Scenario two, he multiplied the user/Movie embedding matrix with a one hot vector matrix that is essentially an identity matrix to get the same embedding matrices and then he multiplied them

So my Question is, what was it that was achieved by introducing the one hot matrix when it is just an identity matrix that doesn’t do anything.

DeepBlender · July 14, 2019, 12:50pm

It is basically a technicality. Those user and movie vectors need to be restructured, such that they become trainable within a neural network. The math way to do it is by using one hot encodings. The efficient way to do it is with lookups. They are identical. What we have at the end is a slightly different way to train those user and movie vectors which is called embeddings.
So collaborative filtering can be expressed in terms of embeddings and one hot encoding is just a technicality on the way.