Lesson 7 - Official topic

bump!

The embedding represents the latent factors per user/item. What we are trying to do is find these latent vectors based on the loss against the ratings.

We see the Embedding layer as containing parameters or a weight matrix w. The input to the model is x = [2, 3, ...]. The embedding layer forward pass finds the latent factors of these inputs by indexing into w. Now, since indexing is not a smooth function, we can’t really differentiate to get a gradient for gradient descent. Instead what the embedding layer does is it generates a gradient assuming that the indexing was done by a matrix multiplication operation between w and the one-hot vector as the two operations are doing the same thing.

Lets say the model inputs are x = [1, 3, 4] and there are 5 users in total. The one hot representation would be x_one_hot = [0, 1, 0, 1, 1].

The embedding layers thinks about the indexing as:

out = w.T * x_one_hot  
# the embedding actually only does w[x] 
# but it assumes it does a matrix multiplication as they are equivalent.

To show the gradient is calculated we need a loss function:

loss  = (out - targ)**2  # (squared error loss)
# the gradient with respect to the loss for the weights dl/dw
grad_w = (2 * out) * x_one_hot  # dl/dout * dout/dw

@mrfabulous1

Thanx a ton Mr. Fab!! :pray:

Your solution saved me from :exploding_head:!!

1 Like

@add canon printer to iPhone can someone please provide the lecture video?

@Best Electronic Paintball Gun no response from anyone.