In Chapter 4 of the fastbook’s explanation of SGD, we see the score of an image to determine whether it’s a 3/7/etc. is calculated as follows:
(x * w).sum()
Where “x” is a rank-1 tensor of length (28 ^ 2) and “w” is a rank-1 tensor also of length (28 ^ 2).
However, when creating the weights we do:
def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_() weights = init_params((28*28,1))
So the weights array is a rank-2 tensor of shape [784, 1] instead of a rank-1 tensor of shape 784.
Then, to get the prediction we do:
(train_x * weights.T).sum() + bias
We have to take the tranpose of the weights before multiplying.
Why don’t we just do:
weights = init_params(28 * 28)
Then we could do:
(train_x * weights).sum() + bias
without any of the transpose stuff.