Confused regarding embedding layers and "lookup by index"

In Chapter 8 (Collaborative Filtering) of the fastbook, they talk about embedding layers and the fact that you can’t really represent “Look up at this index” very nicely in a deep learning model:

To calculate the result for a particular movie and user combination, we have to look up the index of the movie in our movie latent factor matrix and the index of the user in our user latent factor matrix; then we can do our dot product between the two latent factor vectors. But look up in an index is not an operation our deep learning models know how to do. They know how to do matrix products, and activation functions.

I remember hearing this before and had sort of internalized it to mean we can either:

  1. Instead of “looking something up” in a deep learning model, we could use a one-hot vector and matrix multiplication. Our gradients will work the same way as they always do but we’ll end up wasting a lot of memory and compute on the zeros of our one-hot vector.

  2. Alternatively, we could construct a custom layer (eg. nn.Embedding) that perform a “Gather” during the forward pass and a “Scatter” of gradients during the backward pass. In PyTorch I see this happening here:

One thing that surprised me was later in the chapter they index against the embedding vector directly, which I thought we could not do!

def create_params(size):
    return nn.Parameter(torch.zeros(*size).normal_(0, 0.01))

class DotProductBias(Module):
    def __init__(self, n_users, n_movies, n_factors, y_range=(0,5.5)):
        self.user_factors = create_params([n_users, n_factors])
        self.user_bias = create_params([n_users])
        self.movie_factors = create_params([n_movies, n_factors])
        self.movie_bias = create_params([n_movies])
        self.y_range = y_range
    def forward(self, x):
        users = self.user_factors[x[:,0]]  # <--- Looking up by index
        movies = self.movie_factors[x[:,1]] # <--- Looking up by index
        res = (users*movies).sum(dim=1)
        res += self.user_bias[x[:,0]] + self.movie_bias[x[:,1]]
        return sigmoid_range(res, *self.y_range)

This model trains without any problems, no need for special embedding layers, the default nn.Parameter() seems to handle this just fine.

Was my understanding incorrect? Can “look up by index” actually be expressed natively in PyTorch?


Hi Josh,

I feel a bit shy to reply to your question after seeing your extensive experience with fastai and machine learning. But here goes anyway and you can take what’s helpful.

Of course you already know that PyTorch can look up by index. PyTorch can index, slice, dice, gather, scatter, index by a matrix, etc., even on the GPU. So maybe I am misunderstanding your question.

If you look at the implementation of nn.Embedding, the lookup table is held in PyTorch, and the indexing is done inside C++/CUDA. You will not see the indexing step in PyTorch, but it is certainly possible to do so.

I’ve not read Chapter 8, so this is only a speculation. That fact that indexing is equivalent to matrix operations lets you more easily apply the extensive math of machine learning and linear algebra to embedding. It lets you understand what embedding does in terms of matrix algebra, when ML theory cannot understand indexing in general. But it is not saying that PyTorch cannot do indexing.