In MNIST Loss function section, authors use this to calculate the prediction:
(train_x[0]*weights.T).sum() + bias
Why the weights are transposed in this case?
In previous section on SGD, weights were never transposed.
In MNIST Loss function section, authors use this to calculate the prediction:
(train_x[0]*weights.T).sum() + bias
Why the weights are transposed in this case?
In previous section on SGD, weights were never transposed.
I was just trying to figure this out too. i am very confused by what’s happening here
ok i think i figured it out. The object returned by init_params
is a 2 dimensional vector. We can see it’s shape by calling weights.size()
which returns:
torch.Size([784, 1])
which means it has 1 columns with 784 rows. But the rest of the lesson is using a vector stacked horizontally, meaning 1 row with 784 columns. If we call weight.T.size()
we get the correct shape with all the parameters ‘side by side’
torch.T.size()
torch.Size([1, 784])
you can also observe this yourself by looking at the first few parameters in weights
:
weights[0:5]
# returns:
tensor([[ 0.8505],
[-0.8496],
[-1.4020],
[ 0.1723],
[-0.2206]], grad_fn=<SliceBackward>)
weights[0:5].T
# returns:
tensor([[ 0.8505, -0.8496, -1.4020, 0.1723, -0.2206]], grad_fn=<PermuteBackward>)
def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()
Hi Thanks for that answer. Why do we need a std parameter in the definition of init_params?
Thank you
in this example it’s not actually doing anything. In this chapter when we called init_params
we don’t use that part of the function.
in the ‘putting it all together’ section this is called:
weights = init_params((28*28,1))
bias = init_params(1)
since std
isn’t passed in as a keyword argument, init_params
uses a default value of 1 (std=1
), and then multiplies torch.randn(size)
by 1, which does nothing of course.
But if for some reason you wanted more variance in the initial parameters you could now do that easily like this:
weights = init_params((28*28,1), std=10)
bias = init_params(1, std=10)
now when init_params
is called it uses 10 instead of 1 as the std
value meaning the values returned will be multiplied by 10.
As far as i could tell this functionality wasn’t used in this chapter but perhaps will be used later in the course
How do we get a prediction based on this formula? (train_x[0]*weights.T).sum() + bias