"fine_tune" vs. "fit_one_cycle"

joneslloyd · March 20, 2020, 12:59pm

Hi guys,

Feel free to delete this if it’s not allowed (as it references the previous year’s course), but:

In the intro, why does Jeremy use learn.fine_tune instead of learn.fit_one_cycle?

I see both are used in places, but in the previous course the starting lessons(s) didn’t make use of fine_tune, but this time we do. I’m just wondering why / why not.

I’ve read the documentation (for fine_tune and fit_one_cycle), and my understanding is that fine_tune allows training just the head (final layer(s)) initially, then then all layers as a second step. And that fit_one_cycle uses the 1cycle policy (min and max learning rates).

I get this, but I don’t understand why sometimes we do this (fine_tune) and other times use fit_one_cycle along with freeze() and unfreeze() etc. Do they achieve the same ends? If so, why use one over the other?

Thanks!

muellerzr · March 20, 2020, 1:03pm

They do, it’s a convince function that through testing they’ve found works pretty well in almost any transfer learning application. If you notice (looking at fine_tune’s source code), fine_tune (more or less) follows the same steps as what we did back in the beginning of last years course. So in the end it does more or less the same things. To paraquote Sylvain (don’t remember the exact, this was discussed yesterday on Zoom):

“We’ve found the defaults work very well in most applications”

joneslloyd · March 20, 2020, 1:06pm

I see! Thanks for clarifying this

jeremy · March 20, 2020, 1:19pm

No need to delete! I just moved it to the “non-beginner” category. Feel free to discuss anything here!

Albertotono · March 20, 2020, 4:56pm

Is it correct to say
fit_one_cycle = New Model
fine_tuning = with Transfer Learning?

muellerzr · March 20, 2020, 5:05pm

I’d say yes but with a very strong but, only because it’s easy to fall into a trap that way. fine_tuning is geared towards transfer learning specifically, but you can also just do fit_one_cycle as well! (Or flat_cos).

For beginners it’s a great starting fit function (and advanced too), but also don’t forget that you can then build on what that function is doing. For instance, I wonder how modifying/adapting that function for Ranger/flat_cos would need to change!

zerotosingularity · March 20, 2020, 8:41pm

In addition to what’s already been said:

I was figuring out the exact same thing tonight. Looking at the source code, is the easiest way for me to wrap my head around it (see below).

fine_tune

is a

particular combination of fit_one_cycle(s) + (un)freeze(s), that works well in a lot (if not most) situations...

from https://github.com/fastai/fastai2/blob/master/fastai2/callback/schedule.py#L151


def fine_tune(self:Learner, epochs, base_lr=2e-3, freeze_epochs=1, lr_mult=100,
              pct_start=0.3, div=5.0, **kwargs):
    "Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
    self.freeze()
    self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
    base_lr /= 2
    self.unfreeze()
    self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)

DanielLam · March 20, 2020, 11:27pm

Thanks for clipping the code. Helped me understand the difference easily.

davecampbell · December 26, 2020, 12:29pm

yes, that code clip helped me see a lot of context as i asked myself the same question. here’s the link to the more current fastai code base:
‘’'https://github.com/fastai/fastai/blob/f2ab8ba78b63b2f4ebd64ea440b9886a2b9e7b6f/fastai/callback/schedule.py#L153

trying to appreciate the lack of standard reference docs but teasing out answers to a fairly basic question on a broad foundational topic is not for the feint of heart.

also continuing to appreciate github search within a repo to surface specific uses for review:
‘’'https://github.com/fastai/fastbook/search?q=fine_tune
‘’'https://github.com/fastai/fastbook/search?q=fit_one_cycle

Danrohn · May 4, 2022, 11:27pm

Hey, great topic!
Can anyone clarify what that freeze() or unfreeze() crucial for?
As far as I understand, it’s about pretrained weights.
If I use a pretrained model, then should I use fine_tune() rather than fit_one_cycle?
How many epochs should I train while being freezed before I unfreeze it?
Thanks

GeorgePearse · February 13, 2023, 11:05pm

Hey Danrohn, freeze() means do not update these weights, unfreeze() undoes that and will update them.

It’s normally applied to the backbone/body (same thing) of a model so that you don’t remove what it’s learnt from the dataset it was originally trained on.

in fine_tune:

freeze the body
just run training for the last few layers (transfer learning)
unfreeze the body
make a small change to all of the model (fine tuning)

rgh · November 3, 2024, 10:32pm

thanks a lot for your response. source code really helps me.
my challenge with these two different training ways solved✌️