I confess that it’s really puzzling to me how this unusual non-standard DotProduct can do the job it’s supposed to do here. As others have pointed out, (a*b).sum(1) does a line-by-line dot product of the two matrices a and b, resulting in a 1-D vector, not a matrix multiply which would yield a 2-d matrix. So this DotProduct only works in Numpy or PyTorch if the input matrices have identical dimensions, which they do when they are batched up in Jeremy’s code for the EmbeddingDot class in the notebook.
But the problem is that in Jeremy’s Excel model (“Why a matrix factorization and not a neural net?” at 00:12:15 in the lesson 5 video), what he does is precisely a matrix multiply of users and movies, not this weird line-by-line DotProduct. The loss is then calculated over the entire 2-d matrix, not over a 1-d vector, and that is as it should be. So that’s completely different from what the PyTorch EmbeddingDot class does, and I can’t figure out why the latter is correct, even though it seems to yield half-way decent results as measured by RMSE. The only explanation I could come up with is that because EmbeddingDot cycles through many batches over 3 epochs, it ends up comparing enough different user-movie pairs in its 1-d vector to get decent results.
Can anyone explain where I’m going wrong here? Or should the EmbeddingDot code be revised to use the actual PyTorch matrix multiply torch.mm(a,b), as others here have proposed?