How can high learning rate diverge?

zhigang1992 · March 27, 2018, 1:06pm

We have seems something similar to this, even in class

Is this graph accurate?

I’m assuming we are talking about constant learning rate, but in these graphs, it likes like the learning rate is actually growing?

I’m more convinced with

main-qimg-f3972c89625c0451f0ea92a4c0ea0728

from here

But if the second graph is the truth, then high learning rate will not cause it to diverge right, it will just stabilize at a relatively higher error rate?

Or is there some math behind it, and graph one is the real deal?

nok · March 27, 2018, 1:26pm

Imagine it is a basket, if you just slam the ball it will bounce off to elsewhere instead of bouncing in the basket.

wdhorton · March 27, 2018, 1:33pm

It’s not that the learning rate is growing, it’s that the size of the step is equal to the learning rate multiplied by the gradient, and for this curve the gradient increases (curve gets steeper) as you get further from the minimum, so your steps (the red lines in the diagram) keep getting bigger.

msp · March 28, 2018, 8:56am

Also note that this is just a primitive (1-dimensional) sketch of what happens in deep learning. Instead of one parameter you are optimizing millions.

prash.goel · July 6, 2019, 10:15pm

I hope this helps: