I’ve been watching the videos and reading through the docs and feel like I have a good understanding of what fit_one_cycle
is doing.
It varies the learning rate from a small number up to max_lr
and then back down to a small number. When you pass a slice
of two values, it linearly adjusts the max_lr
for each layer so that the first layers train slower than the deeper layers.
It looks like the default value is slice(None, 0.003, None)
.
I have a couple of questions:
- If my model is frozen, does this learning rate scale over the whole model or just the unfrozen layers?
- If I am training from scratch instead of using transfer learning, should I still be using a slice? Or should I train all layers at the same rate? My understanding was that we do this because the earlier layers are more likely to be “pretty good” already – but that doesn’t seem to be the case if the weights are randomly initialized.
- In most of the notebooks we run
fit_one_cycle
with the defaultmax_lr
initially when the model is frozen – why do we not runlr_find
on that step? IsNone
->0.003
just always a good choice for a frozen model? - What does the third
None
parameter toslice
in the default signify? The docs say it’s a “step” – should it always beNone
?