[Rewritten] Confusion Regarding Calculating Gradients in Gradient Descent


I know what gradients are, but I have a confusion about how many gradients we calculate during gradient descent.

In the chapter 04_mnist_basics, we use a quadratic as our loss function.

f(x) = ax^{2} + bx + c

Now the minimum loss for this function would be the vertex of the quadratic, since the gradient there is zero. The derivative function for the quadratic above is below.

f'(x) = 2ax + b

So we would want to find the weights that would converge the gradient to 0.

However, the lesson calculates multiple gradients; a gradient was calculated for a, b, and c, and I’m getting confused over that.

The concept of gradients I know of is a singular value that represents the slope of a function at a point, and the value of the gradient at that point is given by the corresponding derivative function.

What do the gradients of a, b, and c represent? Each weight has its own gradient and that isn’t making sense to me.

I would appreciate clarfication on this! The relevant section in the lesson if you want to have a look is Stochastic Gradient Descent (SGD).