When Iām performing gradient descent, I want to know why the gradient is multiplied instead of the normalized gradient, because the normalized gradient can really represent the direction of the fastest descent
Thanks very much
1 Like
The normalized gradient will certainly give you the direction of maximum descent. But we care about both the direction and the magnitude of the gradient. When the magnitude is large, we want to take a larger step in the direction of maximum descent.
1 Like