Learn.fit_one_cycle cyc_len

tsoernes · August 7, 2019, 8:09am

Are these equivalent?

learn.fit_one_cycle(cyc_len=3)

vs

for _ in range(3):
  learn.fit_one_cycle(cyc_len=1)

florobax · August 7, 2019, 9:47am

No, even though both will iterate over the whole training set 3 times, the learning rate schedule will be totally different:

The first one will go through one cycle only, meaning it will reach max lr around epoch 2 and then start decreasing. It does the one-cycle schedule only once over the 3 epochs.
The second one will have one cycle for each epoch, meaning it will reach max lr within each epoch and decrease afterwards. It basically works as you expect, ie 3 times a one-cycle schedule.

tsoernes · August 7, 2019, 10:22am

@florobax Thanks. Which one do you expect to have the largest ‘cumulative’ learning rate, i.e. area under a time-vs-learnrate curve? I’ve gotten better results with the for loop (multiple one-cycles) thus far.

florobax · August 7, 2019, 11:48am

Your question made me curious, so I did the calculation, and unless I made a mistake (which is possible), the area under the time-vs-lr curve (with fastai’s schedule) for one cycle is N_{iter}(0.3\times lr_{max}+0.7\times lr_{fin}+\frac{0.3(lr_{max}-lr_{min})+0.7(lr_{fin}-lr_{max})}{2}), which is basically a complicated way to say that it is proportional to the number of iterations. So it is absolutely the same in terms of area under the curve to run multiple cycles or to augment the number of iterations. I’ll make a small experiment to corroborate that.
However, keep in mind that one-cycle was designed to be used once over multiple epochs (so without the for loops). If it works better for you with the for loop, just use it then. Some learning rate schedules use multiple cycles, as you can see in this article.

florobax · August 7, 2019, 12:00pm

After a small experiment, it seems that I’m right. The difference between the two is around 8e-5, and seems to come from numerical instability more than anything else.