What is a better work flow with fit_one_cycle

One work flow with one big fit_one_cycle is:

1. learn.lr_find()
2. learn.recorder.plot()
3. choose a proper lr according to 2.
4. fit_one_cycle(N, max_lr = some_lr)

Since N is relatively large. This can take a long time(like 10 hours or more). If the whole process is interrupted, I’ll have to retrain it over.
Another annoying thing is that if I first choose a smaller N and run fit_one_cycle, let’s say it takes 2 hours to show the result, then I realize that I need run more epochs, I have to re-train from the beginning and it seems to me that the first training effort is wasted.

Another approach is with multiple fit_one_cycle calls:

for i in range(num_of_fit_one_cycles):
     learn.lr_find()
    learn.recorder.plot()
    choose some max lr
    fit_one_cycle(N/num_of_fit_one_cycles, max_lr = some_lr)    

With this, I can save the intermediate training results. If the training is interrupted, I can resume from some points to save time/money. But I am not sure if the second one has worse training performance
than the first work flow in theory.

Thanks

1 Like

I did some experiments with mnist. It looks like the 2nd approach is not good and does not have consistent performance.

Always use save model and/or early stopping callbacks https://docs.fast.ai/callbacks.html#SaveModelCallback

In terms of “long training” like the 10 hours you mention…

My workflow is to use a short but effective number of cycles (on a big dataset, 10-30 mins), to tune hyperparameters eg lr’s, wd, arch, onecycle start and div.

Then, because 1cycle protects so well against overfitting, calculate a long cycle length to consume the time I have available. Overnight is a good yardstick, for either a full run or multiple runs (cross validation). Then I try shorter and longer lengths to compare.

1 Like

thanks for your advice. :grinning:

I’ll try your workflow and see how it works for me. Thanks a lot