Lesson 2 gives a quick overview of interpreting lr_find
, I believe Lesson 3 goes into more depth. The short answer is you do look for different areas depending on whether the model has been trained or not.
Part of it is the extreme range of learning rates you are using. You are having it traverse from 1e-30 to 1e-2 in 100 batches (which is the default). Which is a lot of learning rates to test and very little data to test them on.
The other part is lr_find
isn’t deterministic because there will be differences in augmentations applied, etc. See this post for a longer answer.