Feel free to delete this if it’s not allowed (as it references the previous year’s course), but:
In the intro, why does Jeremy use learn.fine_tune instead of learn.fit_one_cycle?
I see both are used in places, but in the previous course the starting lessons(s) didn’t make use of fine_tune, but this time we do. I’m just wondering why / why not.
I’ve read the documentation (for fine_tune and fit_one_cycle), and my understanding is that fine_tune allows training just the head (final layer(s)) initially, then then all layers as a second step. And that fit_one_cycle uses the 1cycle policy (min and max learning rates).
I get this, but I don’t understand why sometimes we do this (fine_tune) and other times use fit_one_cycle along with freeze() and unfreeze() etc. Do they achieve the same ends? If so, why use one over the other?
They do, it’s a convince function that through testing they’ve found works pretty well in almost any transfer learning application. If you notice (looking at fine_tune’s source code), fine_tune (more or less) follows the same steps as what we did back in the beginning of last years course. So in the end it does more or less the same things. To paraquote Sylvain (don’t remember the exact, this was discussed yesterday on Zoom):
“We’ve found the defaults work very well in most applications”
I’d say yes but with a very strong but, only because it’s easy to fall into a trap that way. fine_tuning is geared towards transfer learning specifically, but you can also just do fit_one_cycle as well! (Or flat_cos).
For beginners it’s a great starting fit function (and advanced too), but also don’t forget that you can then build on what that function is doing. For instance, I wonder how modifying/adapting that function for Ranger/flat_cos would need to change!
trying to appreciate the lack of standard reference docs but teasing out answers to a fairly basic question on a broad foundational topic is not for the feint of heart.
Hey, great topic!
Can anyone clarify what that freeze() or unfreeze() crucial for?
As far as I understand, it’s about pretrained weights.
If I use a pretrained model, then should I use fine_tune() rather than fit_one_cycle?
How many epochs should I train while being freezed before I unfreeze it?
Thanks
Hey Danrohn, freeze() means do not update these weights, unfreeze() undoes that and will update them.
It’s normally applied to the backbone/body (same thing) of a model so that you don’t remove what it’s learnt from the dataset it was originally trained on.
in fine_tune:
freeze the body
just run training for the last few layers (transfer learning)
unfreeze the body
make a small change to all of the model (fine tuning)