I would like to know if both of these approaches do the same thing. Because on my experiments, doing another epoch can reduce the accuracy of the model. Using the one at a time approach I could save each stage and return to the most successful if I want, right?
Does anyone has a better way of doing that or know if I am completely wrong about it?
They’re not equivalent. fit_one_cycle varies the learning rate across epochs. It starts slow, increases the learning rate up to the max_lr value you pass (or 0.003 if you pass nothing and use the default), and then slows it back down.
You’re right, the learning rate does get adjusted after each batch but the cycle spans the total number of epochs (it’s one cycle per fit_one_cycle, not one cycle per epoch).
Ok, so if I ran 10 epochs but observe that the model could get better accuracy fitting some more cycles, should I load the learning rate before those 10 cycles (restart the model) and run another 12? I don’t have a way to fit some 2 more using the same values? Or just go for refining it with the unfreeze function?
Jeremy got asked this question in the Lesson 2 video and said he didn’t know if one would be better than the other or not. (Look around 71 minutes into the video)