Element wise multiplication vs broadcasting

Hi, I am a bit confused about this timestamp where Jeremey introduces broadcasting. Isnt this a element wise multiplication.

How does element wise multiplication, broadcasting, dot product, matrix vector multiplication same or different.

I was a little confused with that as well and this is how I thought about it.

It is the same as element-wise multiplication in this case (each element multiplied by 2). I think the use of broadcasting in this case is just to represent it in a way that the computer can understand.

I got tripped up a few times between the difference between matrix vector multiplication (@ operator in python) and broadcasting element wise (* operator). In this case, I find it easier to think of the expected size of the output matrix. In element-wise multiplication, the output should be the same size as one of the input matrices, whereas in matrix multiplication, it should follow the rule that matrix A (m rows x n columns) multiplied by matrix B (n rows by j columns) results in a matrix C (m rows x j columns).

I hope this helped. Bear in mind that I’m also a beginner and what I said might not be completely correct, it’s just how I think of it and it seems to work.


Indeed, broadcasting is used internally to be able to do element-wise operation. To do element-wise operations, both the tensors need to be of equal shape. If this is not the case, PyTorch (and numpy) use broadcasting to try and make them the same size according to the so-called “broadcasting” rules.

Whenever the tensors involved aren’t the same shape, it’s always important to check whether the used broadcasting rules conform to the way you would like to do your operation.