But if the second graph is the truth, then high learning rate will not cause it to diverge right, it will just stabilize at a relatively higher error rate?
Or is there some math behind it, and graph one is the real deal?
It’s not that the learning rate is growing, it’s that the size of the step is equal to the learning rate multiplied by the gradient, and for this curve the gradient increases (curve gets steeper) as you get further from the minimum, so your steps (the red lines in the diagram) keep getting bigger.