for pretrained=False resnet18 i get ‘strange pattern’ in error rate when i run multiple 10xepoch runs. i set 10 eopchs and when it finishes i start another 10 epochs and 3rd time 10 epochs. i was either expecting for error to decrease or to increase (over-fitting). but i didn’t expect for error to start higher than last run, then increase initially then decease and same for 3rd run. i not sure what i am doing wrong here?
I think it is because of cos annealing learning rate. https://www.google.fi/amp/s/www.jeremyjordan.me/nn-learning-rate/amp/
I’m not familiar with that so I can’t say for sure but that is what I expect.
I would that if you want to fine-tune your model several times, you might want to run lr_find after each training cycle, as the lr you’ll need will be different (most likely lower). If you fine-tune a 2nd or 3rd time with the same lr as the 1st time, it’s not surprising to see the error rate go up, due a lr that is too high.
thanks @Lankinen & @oguiza
i did some more experiments and run just one fit_one_cycle with 50 epochs, it didn’t finish yet but but up to 39th epoch doesn’t show this ‘strange pattern’…
i would expect that if the learning rate was too high at any of the eopoch, the error rate would go up? as on the iterations of the 10 epochs. I am not sure why i do get different error rate behavior when i run 3 10epoch runs and 1 50epoch run?
Let me try to explain what I think happens.
When you train it for the first time, weights are random, and error rate will lower if the lr is not too high. You are far from a global or local minima.
But once you have trained the model once, that means you have found some minima (where you fit_one_cycle finished with a low lr). If you increase lr again (even if it’s only momentarily) error rate may go up, because you get further from that minima.
I don’t know if the excellent visualization created by joshfp can help (link)
it’s great explanation. thank you and the visualization is perfect.
i don’t specifically change learning rate when re-running fit_one_cycle (i keep it at default value) or does it change in the background by some fastai magic?
thanks! i see that fit_one_cycle changes learning rate and momentum. it makes sense now.
Look at the behavious of the learning rate when you use the
one_cycle policy :
When the cycle ends, the learning rate is lower than the initial value and goes towards 0, it means that the steps you take when changing the weights are very small and that you allow your model to really pinpoint the minimum of your loss.
When you retrain your model, the learning rate goes back to its initial value and thus is higher than your previous training, so your model comes out of the minimum you found earlier.
As you have noticed, training 3x for 10 epochs leads to a better minimum (lower error rate) because you allowed your model to “explore” more minima by reincreasing the learning rate several times.