Chapter 4 - Why are the MNIST weights a rank-2 tensor instead of a rank-1 tensor?

In Chapter 4 of the fastbook’s explanation of SGD, we see the score of an image to determine whether it’s a 3/7/etc. is calculated as follows:
(x * w).sum()

Where “x” is a rank-1 tensor of length (28 ^ 2) and “w” is a rank-1 tensor also of length (28 ^ 2).

However, when creating the weights we do:

def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()
weights = init_params((28*28,1))

So the weights array is a rank-2 tensor of shape [784, 1] instead of a rank-1 tensor of shape 784.

Then, to get the prediction we do:

(train_x[0] * weights.T).sum() + bias

We have to take the tranpose of the weights before multiplying.

Why don’t we just do:
weights = init_params(28 * 28)

Then we could do:
(train_x[0] * weights).sum() + bias

without any of the transpose stuff.

1 Like