Matrix multiplication in lesson 4

vishak · July 9, 2019, 4:01pm

Hello people!

Around 1:34:00 of Lecture four, Jeremy talks about matrix vector multiplication.

There is a column vector of size 3 and a 3x5 matrix. Speaking mathwise these two can’t be multplied unless the 2nd matrix is transposed and the first matrix is multplied from the right to get a 5x1 matrix as the output

But Jeremy multiplies a 3x1 vector with a 3x5 matrix, multiplies the column vector with every column of the second matrix and gets an activation vector of the 5x1

Is this the way mutlplication is done by Pytorch? What am I missing here?

vishak · July 9, 2019, 4:10pm

Ok It seems like Jeremy’s way of doing it gives the same result as well. So there shouldn’t be any problem. So just a different way of calculating the same thing.

I’ll keep my question up just in case someone else has the same question.

Pomo · July 9, 2019, 5:34pm

Nothing is wrong with the math. In matrix multiplication the inner dimensions go away. Please also see

Mark_F · July 9, 2019, 6:06pm

I had exactly the same question when I was watching these videos. I think there is something wrong with the math, at least if you are doing conventional matrix multiplication (i.e. rank 2).

As the OP mentions, the first matrix (vector) he shows is 3 rows (RGB) and one column (single pixel) so it’s 3 x 1. The matrix that he multiplies this vector by is 3 x 5. So the “inner numbers” are 1 and 3, i.e. they do not match. I assume that this is similar to the zero-rank array problem I had earlier, where the shape of the pixel vector is actually (3,) not (3, 1). But I don’t have a heuristic to understand how these zero rank arrays work (i.e. why is the 3 treated as the “inner” number, not the “outer” number, and what is the presumed “outer” number?).

In the Coursera course, the 3 x 5 weight matrix would have been transposed to 5 x 3 and placed in front of the activation matrix of 3 x 1. You would then get a 5 x 1 vector, as shown. It’s true that Jeremy’s way of doing it gets the same result, but it’s not marix multiplication as I’m familiar with it (but I’m no mathematcian!).

Later on, Jeremy uses an excel spreadsheet and does use conventional matrix multiplication where the inner numbers disappear, so I assume he is using some sort of unconventional shorthand in the OP’s example. But if I’m missing something then I’d love have someone explain it to me. I found it very confusing.

This is a great course, Jeremy is brilliant, it’s all free – I’m not discounting any of that. But I will say that for simpler minds such as my own, these unconventional and unexplained math variances make it much more difficult and time-consuming for me to follow because I always feel like there’s something that I don’t understand.

Welcome anyone’s insight to set me straight.

Pomo · July 9, 2019, 10:16pm

In case it eases your mind, the convention used by PyTorch and APL is different from and more logically consistent than that used in math or apparently Coursera. In math, multiplication is defined only between two-dimensional matrices. In PyTorch, the arguments can be of any rank as long as the inner dimensions match.

[rank 2] @ [rank 2] -> [rank2]
[rank 1] @ [rank 2] -> [rank 1]
[rank 1] @ [rank 1] -> [rank 0]

So forget math - this is how it actually works.

Mark_F · July 9, 2019, 11:09pm

That’s really helpful, thanks.

So in Jeremy’s example above, is it rank [0] x rank [2], shape (3, ) x shape (3, 5) with 3 as the “inner” number? I assume it can’t be shape (3,1) by (3,5) or Pytorch would broadcast, right?

Pomo · July 10, 2019, 12:07am

The left parameter is a vector, i.e., a rank 1 tensor of length 3. A rank 0 tensor would be a scalar. The inner dimensions, the ones that touch, are both 3.

I don’t know about broadcasting in theory. I would need to test.

A suggestion: make a chart of a scalar, a vector, and a matrix vs. their dimensions and rank, for your own understanding. Then try various combinations with matrix multiplication and broadcasting.

Rho, rho, rho of X
Always equals 1
Rho is dimension, rho rho rank.
APL is fun!

Richard M. Stallman (1969), GNU APL , 27 Sep 2013.

Mark_F · July 10, 2019, 1:12am

Thanks. I don’t expect you to reply, but I wanted to make one more post in case anyone else reads this so that I can correct something I said.

I looked at my notes, and Coursera does, as you suggest, call vectors of shape (n, ) rank 1 arrays. Andrew Ng suggests avoiding them because they are glitchy, i.e. he suggests representing vectors as rank 2 arrays of shape (n, 1). (He effectively predicted my confusion on the topic.)

I will play with it, as you suggest.

Edit: After playing with it a bit, it seems evident that the rank 1 arrays are printed in PyTorch as row vectors of shape (1, 10), but are treated as column vectors of shape (10, 1). Which corresponds to the idea that a rank one array of shape (n) treats n as the "inner’ number. Also, it does not appear that the “@” function broadcasts.

dries · October 28, 2019, 9:42pm

I ran into the same problem as originally posted here, but in the answers provided in this thread I do not find an explanation of how Jeremy does these multiplications. After looking up tutorials on matrix multiplication I get only the conventional way of multiplying, the same way I learned as a mechanical engineer. When I google for tensor multiplications, I get lost in topics that are about the physical meaning of tensors, and though interesting, they do not give an answer to how Jeremy multiplies his matrices. Can anyone point me to a tutorial or an explanation on Jeremies way of doing this?

To add, in an earlier lesson Jeremy referred to http://matrixmultiplication.xyz/ for matrix multiplication, which is just the conventional way and not the way that he is multiplying in lesson 4.

satyartha · May 21, 2020, 12:58pm

Hi Awesome folks, I hope you are all staying safe. Did anyone find a proper explanation to this question? Could the original poster please elaborate if he already found the answer?
Appreciate it @vishak.
Thanks,
Satyarth