Feel free to delete this if it’s not allowed (as it references the previous year’s course), but:
In the intro, why does Jeremy use
learn.fine_tune instead of
I see both are used in places, but in the previous course the starting lessons(s) didn’t make use of
fine_tune, but this time we do. I’m just wondering why / why not.
I’ve read the documentation (for fine_tune and fit_one_cycle), and my understanding is that
fine_tune allows training just the head (final layer(s)) initially, then then all layers as a second step. And that
fit_one_cycle uses the 1cycle policy (min and max learning rates).
I get this, but I don’t understand why sometimes we do this (
fine_tune) and other times use
fit_one_cycle along with
unfreeze() etc. Do they achieve the same ends? If so, why use one over the other?
They do, it’s a convince function that through testing they’ve found works pretty well in almost any transfer learning application. If you notice (looking at fine_tune’s source code), fine_tune (more or less) follows the same steps as what we did back in the beginning of last years course. So in the end it does more or less the same things. To paraquote Sylvain (don’t remember the exact, this was discussed yesterday on Zoom):
“We’ve found the defaults work very well in most applications”
I see! Thanks for clarifying this
No need to delete! I just moved it to the “non-beginner” category. Feel free to discuss anything here!
Is it correct to say
fit_one_cycle = New Model
fine_tuning = with Transfer Learning?
I’d say yes but with a very strong but, only because it’s easy to fall into a trap that way. fine_tuning is geared towards transfer learning specifically, but you can also just do fit_one_cycle as well! (Or flat_cos).
For beginners it’s a great starting fit function (and advanced too), but also don’t forget that you can then build on what that function is doing. For instance, I wonder how modifying/adapting that function for Ranger/flat_cos would need to change!
In addition to what’s already been said:
I was figuring out the exact same thing tonight. Looking at the source code, is the easiest way for me to wrap my head around it (see below).
particular combination of fit_one_cycle(s) + (un)freeze(s), that works well in a lot (if not most) situations...
def fine_tune(self:Learner, epochs, base_lr=2e-3, freeze_epochs=1, lr_mult=100,
pct_start=0.3, div=5.0, **kwargs):
"Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
base_lr /= 2
self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)
Thanks for clipping the code. Helped me understand the difference easily.
yes, that code clip helped me see a lot of context as i asked myself the same question. here’s the link to the more current fastai code base:
trying to appreciate the lack of standard reference docs but teasing out answers to a fairly basic question on a broad foundational topic is not for the feint of heart.
also continuing to appreciate github search within a repo to surface specific uses for review:
Hey, great topic!
Can anyone clarify what that
unfreeze() crucial for?
As far as I understand, it’s about pretrained weights.
If I use a
pretrained model, then should I use
fine_tune() rather than
How many epochs should I train while being freezed before I unfreeze it?