Hi there, how are you doing?

I was going through the lesson 4 with MNIST database and I was a bit confused about the way the system learn through backpropagation.

I understand that we calculate the gradient with loss function and then update the parameters accordingly. But here’s what i dont get yet.

In Lesson 4, the weights are initialized for all pixel values. The model multiply each pixel value with a weight value and then add the bias. After that, all of them are summed together.

def linear1(xb):
return xb@weights + bias

preds = linear1(train_x)

We assign the target value to be 1 for image of three and 0 for image of 7. The loss function is target value minus predicted values. The gradient is then calculated accordingly. So both targeted value and predicted value are tensor of 1

But with this function, the gradient is based on the subtraction of 1 single value (targeted value) and 1 single value (predicted value). Thus, the gradient would be 1 single value also (tensor of 1). But when Pytorch finishes calculating, run the the weight_grad would be (784,1). 784 is total number of pixels.

That is what I’m stuck at understanding.

I hop you could help me with it

Thank you