bump!
The embedding represents the latent factors per user/item. What we are trying to do is find these latent vectors based on the loss against the ratings.
We see the Embedding
layer as containing parameters or a weight matrix w
. The input to the model is x = [2, 3, ...]
. The embedding layer forward pass finds the latent factors of these inputs by indexing into w
. Now, since indexing is not a smooth function, we can’t really differentiate to get a gradient for gradient descent. Instead what the embedding layer does is it generates a gradient assuming that the indexing was done by a matrix multiplication operation between w
and the one-hot vector as the two operations are doing the same thing.
Lets say the model inputs are x = [1, 3, 4]
and there are 5 users in total. The one hot representation would be x_one_hot = [0, 1, 0, 1, 1]
.
The embedding layers thinks about the indexing as:
out = w.T * x_one_hot
# the embedding actually only does w[x]
# but it assumes it does a matrix multiplication as they are equivalent.
To show the gradient is calculated we need a loss function:
loss = (out - targ)**2 # (squared error loss)
# the gradient with respect to the loss for the weights dl/dw
grad_w = (2 * out) * x_one_hot # dl/dout * dout/dw
@add canon printer to iPhone can someone please provide the lecture video?
@Best Electronic Paintball Gun no response from anyone.