Theoretical Questions About Gradient Descent

lizhuofeng · June 10, 2022, 7:42am

When I’m performing gradient descent, I want to know why the gradient is multiplied instead of the normalized gradient, because the normalized gradient can really represent the direction of the fastest descent
Thanks very much

Pomo · June 12, 2022, 6:39pm

The normalized gradient will certainly give you the direction of maximum descent. But we care about both the direction and the magnitude of the gradient. When the magnitude is large, we want to take a larger step in the direction of maximum descent.