Let’s say I call fit_one_cycle with 100 epochs, does the learning rate and momentum cycles take into account the # of epochs I specify here? Or does the actual cycle is for ONE epoch?
I am wondering about the best way to use fit_one_cycle in combination with EarlyStoppingCallback.
If one cycle == one epoch then this is fine. But let’s say I setup my learner with EarlyStoppingCallback like so:
Then I want to fit my learner during the night and make it stop when it doesn’t improve anymore. So I call fit_one_cycle with 100 epoch hoping it will trigger EarlyStoppingCallback before reaching the 100th epoch. But if the cycle take into account the # of epochs I guess this is not optimal.
You can use warm_start =0.1, which will increase the learning rate for the first 10% of time, and spent 90% of time to decrease it with annealing. I’m not sure if it’s the correct way to do it, but I think it helps you to avoid stopping when increasing learning rate.
I think when your tracked variable (in your case val_loss) doesn’t improve more than your min_delta OR becomes worse than it was the epoch before, then patience is decreased by 1. When patience equals 0, the training stops.
Thanks for your reply @etremblay,
it seems that if in the source code of fastai, we knock out this line:
*if self.operator == np.less: self.min_delta = -1
it will stop the training only after 5 (patience) consecutive worsen validation_loss.