Help: I can't re-create the matrix multiplication in lesson 5 - MNIST SGD

3301x · September 30, 2019, 5:22am

Hi

Sorry to bother you guys with what I expect is a stupid question, but I have spent a couple of days searching the internet and digging through the source code of nn.Linear, F.linear and torch.matmul and I just can’t figure it out why I am getting an error.

Any help would be greatly appreciated.

Question

I am currently trying to understand in Lesson 5 - SGD MNIST how the matrix multiplication is done by nn.Linear is a valid matrix multiplication as it is multiplying the input being a vector of [64, 784] * [10, 784] being the model.parameters.

When I multiply these matrices using either x@a or torch.matmul(x,a) I get a size mismatch error.

My understanding of matrix multiplication and broadcasting

Matrix Multiplication

It was my understanding that a valid matrix multiplication can only occur where the number of the columns of the first matrix is equal to the number of rows on the second matrix i.e. [2 ,3] * [3, 2] would be valid.

Broadcasting

It was my understanding that broadcasting can only occur where the number of columns in the first matrix is equal to the number of columns on the second matrix i.e. [2,3] * [1,3].

Code

Test code

# creating x matrix 
x = torch.ones(1,784) 

# creating a matrix 
a = torch.ones(10,784) 

print('x', x.shape, 'a', a.shape);

x torch.Size([1, 784]) a torch.Size([10, 784])

`y_hat = torch.matmul(x,a)

RuntimeError: size mismatch, m1: [1 x 784], m2: [10 x 784]

Extracting shapes from Lesson 5 - SGD MNIST

# input shape
x,y = next(iter(data.train_dl))
x.shape

torch.size([1,784])

# Weight shape
[p.shape for p in model.parameters()]

[torch.Size([10, 784]), torch.Size([10])]