I’m trying to wrap my head around the learning rate finder and its behaviour. Generally speaking, I’ve found 3 different behaviours:
- The loss starts low and gradually increases, until it reaches a breaking point and “explodes”.
- The loss starts high, decreases and reaches the minimum point in a nearly-shallow surface and then rises.
- The same as 2), but instead of finding a shallow area around the minimum, it’s just a spike.
What are the factors that contribute to these different behaviours? I would imagine that cases 2 and 3 would happen after unfreezing the full network, but it’s also happened to me without unfreezing it.
Finally, what’s the recommended learning rate in case 3? So far I’m trying to use slice(None, x, None), where x is the minimum point, but it’s not giving good results.