I could not understand a paragraph in super-convergence paper by LN Smith

from the original paper of super-convergence: https://arxiv.org/abs/1708.07120

Here we suggest a slight modification of cyclical learning rate policy for super-convergence; always use one cycle that is smaller than the total number of iterations/epochs and allow the learning rate to decrease several orders of magnitude less than the initial learning rate for the remaining iterations.

I could not fully understand the above paragraph. Does he mean after learning rate complete a cycle( initial lr to max lr then back to initial lr), then we should still continue to train further using learning rate “several orders of magnitude less”?

Does it mean something like this:

  1. train lr = 0.001 to 0.01 for 10 epochs
  2. train lr = 0.01 to 0.001 for 10 epochs
  3. train lr = 0.00001 to 0.0001 for 0.5 epochs
  4. train lr = 0.0001 to 0.00001 for 0.5 epochs

See this discussion:

Leslie Smith:



Thanks a lot for the link, otherwise it would be very difficult to search. And it’s reply from LN Smith himself!

1 Like