Learn.finetune ... why the defaults?

So this utterly escaped me during the lecture … I could have sworn I saw learn.fit_once_cycle in the 01 notebook (but then again, I am 48 so it could be old age). Anyhow, I’m curious about the defaults to learn.finetune and a few other things:

  1. Why is pct_start=0.99 hardcoded after freezing but defaulted to 0.3 for unfrozen?

  2. Why is div after freezing left to its default of 25.0, but defaulted to 0.5 for unfrozen?

  3. Why the formual of base_lr /= 2 and then slice(base_lr/lr_mult, base_lr) for setting the discrimintative LRs for the unfrozen model?

  4. Why the slice(base_lr) for the frozen model? Isn’t this the same thing as just setting it to base_lr?

3 Likes

We haven’t covered any of that yet. So I’ve moved this to #part1-v4:non-beginner

It’s been discussed on the Lesson 1 non-beginner but Sylvain chimed in. Basically they’ve found that the defaults alone work very well on most transfer learning problems after experimentation on many different datasets and domains :slight_smile:

1 Like

I figured as much :slight_smile:

It sounds like this will be covered in the future. For each application, I’d been interested to know:

  1. What datasets were used (including splits)?
  2. What architectures were used?
  3. What ablation studies were conducted (hyperparameters tried, ranges, etc…)?

And hey … congrats on the moving up the moderator food-chain and I hope you’re feeling better!

1 Like