If I run a model through a few cycles of fit_one_learner
and then run fit_one_learner
again it appears to restart training every time rather than using the previous training as a warm start. My first questions is β is this actually what is happening? Secondly, if so, what is the right way to train a model and then continue training it later based on the previous training?
If you do a second fit/fit_one_cycle you use the model and optimizer state that you had at the end of the first training.
Hi @sgugger,
I ran fit_one_cycle back to back with the following outputs. Can you explain why the error rate must be going up during the second run? Has it got to do something with how one cycle learning works?
learn.fit_one_cycle(10, max_lr=1e-2)
epoch | train_loss | valid_loss | error_rate |
---|---|---|---|
1 | 0.923413 | 0.966235 | 0.326389 |
2 | 1.148261 | 1.777409 | 0.515873 |
3 | 1.573888 | 2.564810 | 0.596230 |
4 | 1.480725 | 1.868240 | 0.497024 |
5 | 1.316883 | 1.399597 | 0.421627 |
6 | 1.095222 | 1.037655 | 0.326389 |
7 | 0.902378 | 0.846409 | 0.261905 |
8 | 0.701362 | 0.706226 | 0.236111 |
9 | 0.554521 | 0.612059 | 0.212302 |
10 | 0.504270 | 0.604804 | 0.195437 |
Now the second run:
learn.fit_one_cycle(10, max_lr=1e-2)
epoch | train_loss | valid_loss | error_rate |
---|---|---|---|
1 | 0.499214 | 0.655312 | 0.218254 |
2 | 0.660726 | 1.192701 | 0.361111 |
3 | 1.046210 | 1.538221 | 0.438492 |
4 | 1.069426 | 1.354031 | 0.383929 |
5 | 1.018210 | 1.183228 | 0.360119 |
6 | 0.857060 | 0.957268 | 0.319444 |
My second question is: What if I unfreeze the layers now and learn with differential learning rates the way Jeremy showed in Lesson 1 (v3). Will, it still start learning where it left off in the previous two steps?
P.S. My apologies if this is already covered in future lessons. I have only completed lesson 1 so far.
Thanks
The training loss going down then coming back up makes me think that your max learning learning rate is too high for your later epochs.
Should I run lr.find once again? I will try that and post the output here.
Iβd definitely run lr_find after the first 10 epochs. Adding ShowGraph as a callback to your learner can also be really useful for diagnosing these types of problems. If you havenβt made it that far yet, Jeremy definitely discusses how to interpret loss vs. learning rate. If my notes are correct it was discussed in lesson 3.