In Chapter 4 of the fastbook’s explanation of SGD, we see the score of an image to determine whether it’s a 3/7/etc. is calculated as follows:

`(x * w).sum()`

Where “x” is a rank-1 tensor of length (28 ^ 2) and “w” is a rank-1 tensor also of length (28 ^ 2).

However, when creating the weights we do:

```
def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()
weights = init_params((28*28,1))
```

So the weights array is a rank-2 tensor of shape [784, 1] instead of a rank-1 tensor of shape 784.

Then, to get the prediction we do:

`(train_x[0] * weights.T).sum() + bias`

We have to take the tranpose of the weights before multiplying.

Why don’t we just do:

`weights = init_params(28 * 28)`

Then we could do:

`(train_x[0] * weights).sum() + bias`

without any of the transpose stuff.