For the math / linear algebra lovers under us: I wrote a blog on explaining matrix multiplications from scratch and doing a couple of refactorings to speed things up. It’s based on lecture 1 of part2 of the 2019 course which also goes over this, but it has one last speed-up which is not covered in the lecture.
Spoiler alert: by doing so we get faster then torch.einsum