What to do when you are near converging but have a smaller cycle length?

TheShadow29 · May 31, 2018, 7:06am

So I was tracking my training loss and validation loss. I kept a learning rate of 1e-2 for all my layers (the whole network is unfreezed) and I had kept a cycle len of 10. The network gets best validation loss on the 9th epoch. If I do a restart at this point, the network is not able to reach this minima for some reason. What is the best way to resume training from this point? I have the model stored in that state.

My current approach is: Since it is best on the 9th epoch, the learning at that point is presumably 0.05 * 1e-2 as (1 + cosine(0.9 * pi) = 1-0.95 = 0.05). Is this correct or there is a more structured approach. The reason I don’t want to retrain my network is that it takes a long time to train.

Any inputs appreciated.