So this utterly escaped me during the lecture … I could have sworn I saw learn.fit_once_cycle in the 01 notebook (but then again, I am 48 so it could be old age). Anyhow, I’m curious about the defaults to learn.finetune and a few other things:
Why is pct_start=0.99 hardcoded after freezing but defaulted to 0.3 for unfrozen?
Why is div after freezing left to its default of 25.0, but defaulted to 0.5 for unfrozen?
Why the formual of base_lr /= 2 and then slice(base_lr/lr_mult, base_lr) for setting the discrimintative LRs for the unfrozen model?
Why the slice(base_lr) for the frozen model? Isn’t this the same thing as just setting it to base_lr?
It’s been discussed on the Lesson 1 non-beginner but Sylvain chimed in. Basically they’ve found that the defaults alone work very well on most transfer learning problems after experimentation on many different datasets and domains