In Chapter 8 (Collaborative Filtering) of the fastbook, they talk about embedding layers and the fact that you can’t really represent “Look up at this index” very nicely in a deep learning model:
To calculate the result for a particular movie and user combination, we have to look up the index of the movie in our movie latent factor matrix and the index of the user in our user latent factor matrix; then we can do our dot product between the two latent factor vectors. But look up in an index is not an operation our deep learning models know how to do. They know how to do matrix products, and activation functions.
I remember hearing this before and had sort of internalized it to mean we can either:

Instead of “looking something up” in a deep learning model, we could use a onehot vector and matrix multiplication. Our gradients will work the same way as they always do but we’ll end up wasting a lot of memory and compute on the zeros of our onehot vector.

Alternatively, we could construct a custom layer (eg.
nn.Embedding
) that perform a “Gather” during the forward pass and a “Scatter” of gradients during the backward pass. In PyTorch I see this happening here: https://github.com/pytorch/pytorch/blob/f2689b1e1302f784c822bbd6697f53e39df59bb1/torch/onnx/symbolic_opset9.py#L475
One thing that surprised me was later in the chapter they index against the embedding vector directly, which I thought we could not do!
def create_params(size):
return nn.Parameter(torch.zeros(*size).normal_(0, 0.01))
class DotProductBias(Module):
def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
self.user_factors = create_params([n_users, n_factors])
self.user_bias = create_params([n_users])
self.movie_factors = create_params([n_movies, n_factors])
self.movie_bias = create_params([n_movies])
self.y_range = y_range
def forward(self, x):
users = self.user_factors[x[:,0]] # < Looking up by index
movies = self.movie_factors[x[:,1]] # < Looking up by index
res = (users*movies).sum(dim=1)
res += self.user_bias[x[:,0]] + self.movie_bias[x[:,1]]
return sigmoid_range(res, *self.y_range)
This model trains without any problems, no need for special embedding layers, the default nn.Parameter()
seems to handle this just fine.
Was my understanding incorrect? Can “look up by index” actually be expressed natively in PyTorch?