One work flow with one big fit_one_cycle is:
1. learn.lr_find() 2. learn.recorder.plot() 3. choose a proper lr according to 2. 4. fit_one_cycle(N, max_lr = some_lr)
Since N is relatively large. This can take a long time(like 10 hours or more). If the whole process is interrupted, I’ll have to retrain it over.
Another annoying thing is that if I first choose a smaller N and run fit_one_cycle, let’s say it takes 2 hours to show the result, then I realize that I need run more epochs, I have to re-train from the beginning and it seems to me that the first training effort is wasted.
Another approach is with multiple fit_one_cycle calls:
for i in range(num_of_fit_one_cycles): learn.lr_find() learn.recorder.plot() choose some max lr fit_one_cycle(N/num_of_fit_one_cycles, max_lr = some_lr)
With this, I can save the intermediate training results. If the training is interrupted, I can resume from some points to save time/money. But I am not sure if the second one has worse training performance
than the first work flow in theory.