Lesson 1: Why split into two stages?

Hi everybody. First post here.

During the first stage, we run 4 epochs. We save the weights under ‘stage-1’.
We check that the model seems to be working properly, then unfreeze and run another cycle. Uh-oh! The error rate spiked on the 5th epoch. The learning rate was too high. So we reload ‘stage-1’, find an acceptable learning rate range, and carry on in a more finely tuned way.

However, if I run 5 epochs in stage 1, instead of 4, I don’t see a spike in the error rate at the 5th epoch as before. It keeps decreasing steadily. What accounts for this discrepancy?

Thanks for your help.

Hi @intalentive! Welcome to the forums!
In short:
When you train your model using the OneCycle LR schedule, training for one cycle of 5 epochs is not the same as, and in fact very different from training the model for one cycle of 4 epochs followed by another cycle for one epoch (totalling 5 epochs).

This is how the training schedule looks like when using One Cycle Policy (from https://docs.fast.ai/callbacks.one_cycle.html)
57%20AM

Now as you can probably imagine, doing this for 5 epochs straight as opposed to doing this for 4 epochs and repeating the same again for 1 epoch is very different. There’s a huge bump in the learning rate suddenly after a period of very low learning rates near the end of the first cycle if you do it twice.

4 Likes

Awesome, thanks a bunch @akashpalrecha. I figured some kind of nonlinearity had to be buried in there somewhere, but didn’t know where to look.

1 Like