Confusion Regarding Calculating Gradients in Gradient Descent

ForBo7 · August 6, 2022, 7:47am

Hello!

I know what gradients are, but I have a confusion about how many gradients we calculate during gradient descent.

In the chapter 04_mnist_basics, we use a quadratic as our loss function.

f(x) = ax^{2} + bx + c

Now the minimum loss for this function would be the vertex of the quadratic, since the gradient there is zero. The derivative function for the quadratic above is below.

f'(x) = 2ax + b

So we would want to find the weights that would converge the gradient to 0.

However, the lesson calculates multiple gradients, and I’m getting confused over that. What multiple gradients are we calculating? Apparently each weight has its own gradient, but that doesn’t quite make sense. Isn’t there only a single gradient, which is the gradient of the quadratic?

I would appreciate clarfication on this! The relevant section in the lesson if you want to have a look is Stochastic Gradient Descent (SGD).

hsaldera · August 6, 2022, 9:44am

You are confusing two very different concepts, the gradient function and the gradient itself.

The gradient function ,f'(x), is just a fancy way of saying: how f(x) changes when we change x. On a curve, the gradient changes at each point. Therefore, the gradient(slope) is different at each point.

If you put the value of x into f'(x) it tells you what the gradient of of the curve is at that point. This is pretty straight forward for such a simple equation. Just take the derivative function, plug of x at each point and it gives you the gradient at that point.

The problem is that the equations that represent real life problems aren’t so straight forward. They might have 100 or even thousands of variables. It would be impractical to figure out what the derivative function would be for such equations.

What we need is a mechanism to automatically calculate the derivative, with out actually having do it by hand. Such computers are so fast if there was such a mechanism, we could use it to find derivatives for incredibly complex functions at mind bending speed.

That mechanism is what is discussed in the lesson.

ForBo7 · August 6, 2022, 12:13pm

Thank you for the response!

I understand what you are getting at. What I’m getting confused by is that within that lesson’s section, we computed gradients for a, b, and c in the quadratic above. I know that we can use those gradient values to adjust the constants/weights, but what do those gradients represent? That’s what I’m getting confused by, since the gradient I know is that it is a single value for a given function that represents its slope, and the value of it can change depending on what x you input to the corresponding derivative function.