Continue Training an Already Trained Model

tyler · January 12, 2019, 8:47pm

If I run a model through a few cycles of fit_one_learner and then run fit_one_learner again it appears to restart training every time rather than using the previous training as a warm start. My first questions is – is this actually what is happening? Secondly, if so, what is the right way to train a model and then continue training it later based on the previous training?

sgugger · January 12, 2019, 10:11pm

If you do a second fit/fit_one_cycle you use the model and optimizer state that you had at the end of the first training.

shahensha · January 29, 2019, 11:49am

Hi @sgugger,

I ran fit_one_cycle back to back with the following outputs. Can you explain why the error rate must be going up during the second run? Has it got to do something with how one cycle learning works?

learn.fit_one_cycle(10, max_lr=1e-2)

epoch	train_loss	valid_loss	error_rate
1	0.923413	0.966235	0.326389
2	1.148261	1.777409	0.515873
3	1.573888	2.564810	0.596230
4	1.480725	1.868240	0.497024
5	1.316883	1.399597	0.421627
6	1.095222	1.037655	0.326389
7	0.902378	0.846409	0.261905
8	0.701362	0.706226	0.236111
9	0.554521	0.612059	0.212302
10	0.504270	0.604804	0.195437

Now the second run:
learn.fit_one_cycle(10, max_lr=1e-2)

epoch	train_loss	valid_loss	error_rate
1	0.499214	0.655312	0.218254
2	0.660726	1.192701	0.361111
3	1.046210	1.538221	0.438492
4	1.069426	1.354031	0.383929
5	1.018210	1.183228	0.360119
6	0.857060	0.957268	0.319444

My second question is: What if I unfreeze the layers now and learn with differential learning rates the way Jeremy showed in Lesson 1 (v3). Will, it still start learning where it left off in the previous two steps?

P.S. My apologies if this is already covered in future lessons. I have only completed lesson 1 so far.

Thanks

tyler · February 3, 2019, 8:16pm

The training loss going down then coming back up makes me think that your max learning learning rate is too high for your later epochs.

shahensha · February 3, 2019, 8:39pm

Should I run lr.find once again? I will try that and post the output here.

tyler · February 3, 2019, 9:49pm

I’d definitely run lr_find after the first 10 epochs. Adding ShowGraph as a callback to your learner can also be really useful for diagnosing these types of problems. If you haven’t made it that far yet, Jeremy definitely discusses how to interpret loss vs. learning rate. If my notes are correct it was discussed in lesson 3.