# MNIST Loss function and Prediciton

In MNIST Loss function section, authors use this to calculate the prediction:

(train_x*weights.T).sum() + bias

Why the weights are transposed in this case?

In previous section on SGD, weights were never transposed.

1 Like

I was just trying to figure this out too. i am very confused by what’s happening here

ok i think i figured it out. The object returned by `init_params` is a 2 dimensional vector. We can see it’s shape by calling `weights.size()` which returns:

``````torch.Size([784, 1])
``````

which means it has 1 columns with 784 rows. But the rest of the lesson is using a vector stacked horizontally, meaning 1 row with 784 columns. If we call `weight.T.size()` we get the correct shape with all the parameters ‘side by side’

``````torch.T.size()
torch.Size([1, 784])
``````

you can also observe this yourself by looking at the first few parameters in `weights`:

``````weights[0:5]

# returns:

tensor([[ 0.8505],
[-0.8496],
[-1.4020],
[ 0.1723],

weights[0:5].T

# returns:

tensor([[ 0.8505, -0.8496, -1.4020,  0.1723, -0.2206]], grad_fn=<PermuteBackward>)
``````
1 Like

Hi Thanks for that answer. Why do we need a std parameter in the definition of init_params?
Thank you

in this example it’s not actually doing anything. In this chapter when we called `init_params` we don’t use that part of the function.

in the ‘putting it all together’ section this is called:

``````weights = init_params((28*28,1))
bias = init_params(1)
``````

since `std` isn’t passed in as a keyword argument, `init_params` uses a default value of 1 (`std=1`), and then multiplies `torch.randn(size)` by 1, which does nothing of course.

But if for some reason you wanted more variance in the initial parameters you could now do that easily like this:

``````weights = init_params((28*28,1), std=10)
bias = init_params(1, std=10)
``````

now when `init_params` is called it uses 10 instead of 1 as the `std` value meaning the values returned will be multiplied by 10.

As far as i could tell this functionality wasn’t used in this chapter but perhaps will be used later in the course

How do we get a prediction based on this formula? (train_x*weights.T).sum() + bias