The “Collaborative Filtering” section of Chapter 8 says the following:

To calculate the result for a particular movie and user combination, we have to look up the index of the movie in our movie latent factor matrix and the index of the user in our user latent factor matrix… But

look up in an indexis not an operation our deep learning models know how to do. They know how to do matrix products, and activation functions. Fortunately, it turns out that we can represent look up in an index as a matrix product.

To paraphrase, we need to look up the index of each set of latent factors, but we can’t “look these up” because that’s not something our model knows how to do. Instead, we can use matrix multiplication with a one-hot encoded vector to do something equivalent to an index lookup.

Then, further down, we see the following:

This would be a perfectly acceptable way to build models using this kind of architecture, except that it would use a lot more memory and time than necessary. We know that there is no real underlying reason to store the one-hot-encoded vector, or to search through it to find the occurrence of the number one—we should just be able to index into an array directly with an integer. Therefore, most deep learning libraries, including PyTorch, include a special layer that does just this; it indexes into a vector using an integer, but has its derivative calculated in such a way that it is identical to what it would have been if it had done a matrix multiplication with a one-hot-encoded vector.

Again paraphrasing, this seems to say “But matrix-multiplying our latent factors with one-hot encoded vectors is too resource-intensive, so instead we index into a vector using an integer, except when we’re calculating the derivative (for the purpose of gradient descent).” This seems to partially contradict the previous paraphrase, where we said that our model can’t index into a vector using an integer because it doesn’t know how to do so.

My question is, do we or don’t we index into a vector using an integer? And if so, when does this happen? Is the chapter saying that we use integer indexing on the forward pass, and one-hot encoding + matrix multiplication on the backward pass?