Hi, I’d like to share an example of how to resume an interrupted 1cycle policy training process. Based on my limited experience, running a long 1cycle policy works better than running several shorter ones. However, when things go wrong, we don’t want to rerun the whole thing from the beginning, it’s a waste of time…
My solution is use callbacks to save your model (i.e., SaveModelCallback), if you got interrupted, you can load model of the last epoch and train from there. You just need to change the learning rate schedule. I have some examples here.
With this technique, you can also divide the 1cycle policy into smaller parts and execute each of them. You may want to do this if you have some limitations on how many hours you can run each time. People with powerful machines may not care about this at all, but the good thing about fastai is everyone can use it to do interesting things.
If this might be useful to more users, I can submit a PR to add this feature to the OneCycleScheduler
(my solution is to inherit and modify this one). Just need to add two optional parameters (start epoch and total epochs), the API will be the same so people that are not using this feature won’t be affected.
Hope it helps!