How do I interpret this LR graph (wikitext103)?

sgugger · June 19, 2018, 1:14pm

Yeah the first graph aren’t really helpful since I didn’t put an option to plot them with log scales (for such a range you need it).
For the following ones, remember that the LR Finder is there to give you the order of magnitude of your LR, it will never give you an exact value that’s best. By essence, when you go near the point of divergence, the training goes very shaky and depending on your luck (or lack of) you will diverge sooner sometimes. It’s the same during a real training if you use a high LR too close to the instability. Sometimes it goes wide, sometimes it stays contained.

My advice would be to plot the graphs of the wider range on a log scale, spot the minimum and go one order of magnitude under it.