I’m trying to figure out what to do with a learn.recorder.plot() that looks like this:
I see that in Jeremy’s lecture notes he has a (somewhat) similar graph and selects the area in which it’s flat, but my graph doesn’t drop like his at the 1e-06.
I’m wondering if I need to tell it to start with an even lower learning rate? Any suggests/help would be appreciated. Thanks!
Yes even lower learning rate might help. You can run lr_finder() couple of times to get slightly different graphs (it uses random batches).
There is a feature where you can get a suggested value for learning rate.
just pass in suggestion=True to learn.recorder.plot()
I remember Jeremy H. saying to choose about 10 before the Lr starts rising.
So for your first graph I would choose 1e-02