I see that in Jeremy’s lecture notes he has a (somewhat) similar graph and selects the area in which it’s flat, but my graph doesn’t drop like his at the 1e-06.
Yes even lower learning rate might help. You can run lr_finder() couple of times to get slightly different graphs (it uses random batches).
There is a feature where you can get a suggested value for learning rate.
just pass in suggestion=True to learn.recorder.plot()
as in