One work flow with one big fit_one_cycle is:
1. learn.lr_find()
2. learn.recorder.plot()
3. choose a proper lr according to 2.
4. fit_one_cycle(N, max_lr = some_lr)
Since N is relatively large. This can take a long time(like 10 hours or more). If the whole process is interrupted, I’ll have to retrain it over.
Another annoying thing is that if I first choose a smaller N and run fit_one_cycle, let’s say it takes 2 hours to show the result, then I realize that I need run more epochs, I have to re-train from the beginning and it seems to me that the first training effort is wasted.
Another approach is with multiple fit_one_cycle calls:
for i in range(num_of_fit_one_cycles):
learn.lr_find()
learn.recorder.plot()
choose some max lr
fit_one_cycle(N/num_of_fit_one_cycles, max_lr = some_lr)
With this, I can save the intermediate training results. If the training is interrupted, I can resume from some points to save time/money. But I am not sure if the second one has worse training performance
than the first work flow in theory.
Thanks