How come too high learning rate affects validation loss more?

In lesson 2, Jeremy explained some common problems, one of them was “too high learning rate” and the effect for this situation was massively increased validation loss. I understand when we increase the learning rate too much, gradient descent simply can’t converge the minimum but instead it diverges. What I don’t really get is: how come our training loss increases only a bit, but validation loss increases massively? If we have a diverged model isn’t it supposed to perform equally poorly on the training set and validation set? I would be glad if anyone can help.
Thanks in advance :slight_smile: