MNIST Loss function and Prediciton

In MNIST Loss function section, authors use this to calculate the prediction:

(train_x[0]*weights.T).sum() + bias

Why the weights are transposed in this case?

In previous section on SGD, weights were never transposed.

2 Likes

I was just trying to figure this out too. i am very confused by what’s happening here

1 Like

ok i think i figured it out. The object returned by init_params is a 2 dimensional vector. We can see it’s shape by calling weights.size() which returns:

torch.Size([784, 1])

which means it has 1 columns with 784 rows. But the rest of the lesson is using a vector stacked horizontally, meaning 1 row with 784 columns. If we call weight.T.size() we get the correct shape with all the parameters ‘side by side’

torch.T.size()
torch.Size([1, 784])

you can also observe this yourself by looking at the first few parameters in weights:

weights[0:5]

# returns:

tensor([[ 0.8505],
        [-0.8496],
        [-1.4020],
        [ 0.1723],
        [-0.2206]], grad_fn=<SliceBackward>)


weights[0:5].T

# returns:

tensor([[ 0.8505, -0.8496, -1.4020,  0.1723, -0.2206]], grad_fn=<PermuteBackward>)
2 Likes

def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()

Hi Thanks for that answer. Why do we need a std parameter in the definition of init_params?
Thank you

1 Like

in this example it’s not actually doing anything. In this chapter when we called init_params we don’t use that part of the function.

in the ‘putting it all together’ section this is called:

weights = init_params((28*28,1))
bias = init_params(1)

since std isn’t passed in as a keyword argument, init_params uses a default value of 1 (std=1), and then multiplies torch.randn(size) by 1, which does nothing of course.

But if for some reason you wanted more variance in the initial parameters you could now do that easily like this:

weights = init_params((28*28,1), std=10)
bias = init_params(1, std=10)

now when init_params is called it uses 10 instead of 1 as the std value meaning the values returned will be multiplied by 10.

As far as i could tell this functionality wasn’t used in this chapter but perhaps will be used later in the course

1 Like

How do we get a prediction based on this formula? (train_x[0]*weights.T).sum() + bias