After running a 1cycle policy training, if we want to do more training, I wonder what would be the best practice in this case?
(1) Do a 1cycle policy again with some learning rate (or the same
lr_max?). This will be similar to the SGDR with a lot of cos annealing cycles (with same
lr_max) in the lecture.
(2) Train with very small learning rates.
(3) Rerun the 1cycle policy from the beginning but use more epochs (so it’s really "1"cycle). Of course this probably means ditching the previous 1cycle results.
I have done some experiments but haven’t got consistent results so I’d like to get some suggestions. Any help will be really appreciated!