Continue Training an Already Trained Model

If I run a model through a few cycles of fit_one_learner and then run fit_one_learner again it appears to restart training every time rather than using the previous training as a warm start. My first questions is – is this actually what is happening? Secondly, if so, what is the right way to train a model and then continue training it later based on the previous training?

If you do a second fit/fit_one_cycle you use the model and optimizer state that you had at the end of the first training.

1 Like

Hi @sgugger,

I ran fit_one_cycle back to back with the following outputs. Can you explain why the error rate must be going up during the second run? Has it got to do something with how one cycle learning works?

learn.fit_one_cycle(10, max_lr=1e-2)

epoch train_loss valid_loss error_rate
1 0.923413 0.966235 0.326389
2 1.148261 1.777409 0.515873
3 1.573888 2.564810 0.596230
4 1.480725 1.868240 0.497024
5 1.316883 1.399597 0.421627
6 1.095222 1.037655 0.326389
7 0.902378 0.846409 0.261905
8 0.701362 0.706226 0.236111
9 0.554521 0.612059 0.212302
10 0.504270 0.604804 0.195437

Now the second run:
learn.fit_one_cycle(10, max_lr=1e-2)

epoch train_loss valid_loss error_rate
1 0.499214 0.655312 0.218254
2 0.660726 1.192701 0.361111
3 1.046210 1.538221 0.438492
4 1.069426 1.354031 0.383929
5 1.018210 1.183228 0.360119
6 0.857060 0.957268 0.319444

My second question is: What if I unfreeze the layers now and learn with differential learning rates the way Jeremy showed in Lesson 1 (v3). Will, it still start learning where it left off in the previous two steps?

P.S. My apologies if this is already covered in future lessons. I have only completed lesson 1 so far.


The training loss going down then coming back up makes me think that your max learning learning rate is too high for your later epochs.

Should I run lr.find once again? I will try that and post the output here.

I’d definitely run lr_find after the first 10 epochs. Adding ShowGraph as a callback to your learner can also be really useful for diagnosing these types of problems. If you haven’t made it that far yet, Jeremy definitely discusses how to interpret loss vs. learning rate. If my notes are correct it was discussed in lesson 3.