'strange pattern' in error rate

Let me try to explain what I think happens.
When you train it for the first time, weights are random, and error rate will lower if the lr is not too high. You are far from a global or local minima.
But once you have trained the model once, that means you have found some minima (where you fit_one_cycle finished with a low lr). If you increase lr again (even if it’s only momentarily) error rate may go up, because you get further from that minima.
I don’t know if the excellent visualization created by joshfp can help (link)

3 Likes