In the course notebook lesson1.ipynb
(but in the video lecture 2), there are cells looks like the following
Review: easy steps to train a world-class image classifier¶
- precompute=True
- Use
lr_find()
to find highest learning rate where loss is still clearly improving - Train last layer from precomputed activations for 1-2 epochs
- Train last layer with data augmentation (i.e. precompute=False) for 2-3 epochs with cycle_len=1
- Unfreeze all layers
- Set earlier layers to 3x-10x lower learning rate than next higher layer
- Use
lr_find()
again - Train full network with cycle_mult=2 until over-fitting
but I find it’s a bit weird that you would set differential learning rate before running lr_find()
again. Wouldn’t you want to find the learning rate first and then set the differential learning rate so that earlier layers have lower rates compared to the newly found learning rate?