While re-watching the video for Lesson 6, I heard Jeremy caution the viewer against confusing matrix multiplication with dot products. But elsewhere I’ve heard it said that matrix multiplication under-the-hood is tantamount to taking the dot product of repeated rows. And in fact, you can see that fact demonstrated through examples like the ones on matrixmultiplication.xyz.

In particular, he emphasizes that dot products are calculated in an *element-wise* fashion. My best guess is that this means dot products are calculated one-by-one, making it an O(n*m) operation, while (somehow) a matrix multiplication is parallelized so that all the individual dot products are calculated at once, in O(1) time? If so, this would certainly explain a lot, but this is just a guess and I wanted to ask the forums to confirm or refute my guesses.

If I’m right so far with the above, is it also correct to say that the thing which enables the O(1) performance of matrix multiplication is the fact that the operation is carried out on the GPU, i.e. matrix multiplication is impossible without the translation to CUDA and, by extension, the translation to the GPU equivalent of assembly code?