In MNIST Loss function section, authors use this to calculate the prediction:

(train_x[0]*weights.T).sum() + bias

Why the weights are transposed in this case?

In previous section on SGD, weights were never transposed.

In MNIST Loss function section, authors use this to calculate the prediction:

(train_x[0]*weights.T).sum() + bias

Why the weights are transposed in this case?

In previous section on SGD, weights were never transposed.

1 Like

I was just trying to figure this out too. i am very confused by what’s happening here

ok i think i figured it out. The object returned by `init_params`

is a 2 dimensional vector. We can see it’s shape by calling `weights.size()`

which returns:

```
torch.Size([784, 1])
```

which means it has 1 columns with 784 rows. But the rest of the lesson is using a vector stacked horizontally, meaning 1 row with 784 columns. If we call `weight.T.size()`

we get the correct shape with all the parameters ‘side by side’

```
torch.T.size()
torch.Size([1, 784])
```

you can also observe this yourself by looking at the first few parameters in `weights`

:

```
weights[0:5]
# returns:
tensor([[ 0.8505],
[-0.8496],
[-1.4020],
[ 0.1723],
[-0.2206]], grad_fn=<SliceBackward>)
weights[0:5].T
# returns:
tensor([[ 0.8505, -0.8496, -1.4020, 0.1723, -0.2206]], grad_fn=<PermuteBackward>)
```

1 Like

def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()

Hi Thanks for that answer. Why do we need a std parameter in the definition of init_params?

Thank you

in this example it’s not actually doing anything. In this chapter when we called `init_params`

we don’t use that part of the function.

in the ‘putting it all together’ section this is called:

```
weights = init_params((28*28,1))
bias = init_params(1)
```

since `std`

isn’t passed in as a keyword argument, `init_params`

uses a default value of 1 (`std=1`

), and then multiplies `torch.randn(size)`

by 1, which does nothing of course.

But if for some reason you wanted more variance in the initial parameters you could now do that easily like this:

```
weights = init_params((28*28,1), std=10)
bias = init_params(1, std=10)
```

now when `init_params`

is called it uses 10 instead of 1 as the `std`

value meaning the values returned will be multiplied by 10.

As far as i could tell this functionality wasn’t used in this chapter but perhaps will be used later in the course

How do we get a prediction based on this formula? (train_x[0]*weights.T).sum() + bias