I have had a lot of success with
.fit_one_cycle() and find it helps my models train quickly.
One issue I see is that the number of epochs I choose changes my learning rate schedule. For example, if I use
2 epochs, I vary my learning over the course of both epochs. If I use
1 epoch I vary it over the single epoch. See below image for example:
I’m wondering if anyone knows whether it is better to use
.fit_one_cycle() over the course of say
20 epochs or of it is better to call
20 times for a single epoch.
The reason I ask is because it is often difficult to know how many epochs I should be training for. One strategy I see a lot of people use on Kaggle (typically without cyclical learning rates) is to use learning rate decay and resets. They make sure to save their best network which is basically a form of early stopping. This is appealing to me because I don’t know how many epochs to use when I’m training a variety of neural network architectures.