Are these equivalent?
learn.fit_one_cycle(cyc_len=3)
vs
for _ in range(3):
learn.fit_one_cycle(cyc_len=1)
Are these equivalent?
learn.fit_one_cycle(cyc_len=3)
vs
for _ in range(3):
learn.fit_one_cycle(cyc_len=1)
No, even though both will iterate over the whole training set 3 times, the learning rate schedule will be totally different:
@florobax Thanks. Which one do you expect to have the largest ‘cumulative’ learning rate, i.e. area under a time-vs-learnrate curve? I’ve gotten better results with the for
loop (multiple one-cycles) thus far.
Your question made me curious, so I did the calculation, and unless I made a mistake (which is possible), the area under the time-vs-lr curve (with fastai’s schedule) for one cycle is N_{iter}(0.3\times lr_{max}+0.7\times lr_{fin}+\frac{0.3(lr_{max}-lr_{min})+0.7(lr_{fin}-lr_{max})}{2}), which is basically a complicated way to say that it is proportional to the number of iterations. So it is absolutely the same in terms of area under the curve to run multiple cycles or to augment the number of iterations. I’ll make a small experiment to corroborate that.
However, keep in mind that one-cycle was designed to be used once over multiple epochs (so without the for
loops). If it works better for you with the for
loop, just use it then. Some learning rate schedules use multiple cycles, as you can see in this article.
After a small experiment, it seems that I’m right. The difference between the two is around 8e-5
, and seems to come from numerical instability more than anything else.