Theoretical Questions About Gradient Descent

When Iā€™m performing gradient descent, I want to know why the gradient is multiplied instead of the normalized gradient, because the normalized gradient can really represent the direction of the fastest descent
Thanks very much
image

1 Like

The normalized gradient will certainly give you the direction of maximum descent. But we care about both the direction and the magnitude of the gradient. When the magnitude is large, we want to take a larger step in the direction of maximum descent.
:slightly_smiling_face:

1 Like