'strange pattern' in error rate

NathanHub · November 9, 2018, 10:15am

Look at the behavious of the learning rate when you use the one_cycle policy :

When the cycle ends, the learning rate is lower than the initial value and goes towards 0, it means that the steps you take when changing the weights are very small and that you allow your model to really pinpoint the minimum of your loss.

When you retrain your model, the learning rate goes back to its initial value and thus is higher than your previous training, so your model comes out of the minimum you found earlier.

As you have noticed, training 3x for 10 epochs leads to a better minimum (lower error rate) because you allowed your model to “explore” more minima by reincreasing the learning rate several times.