My intuition is that discriminitive learning rates would only really work if you have sensible weights for your early layers. When you start with an uninitialized network this is not the case.
We can plot the activations from a network and see this for ourselves. If you want more information about the plots, see: https://github.com/JoshVarty/VisualizingActivations/blob/master/VisualizingActivations.ipynb
Below we train a ResNet-18 network with discriminitive learning rates:
learn = cnn_learner(data, models.resnet18, pretrained=False, metrics=[f_score])
We can plot the activations from each
- The x-axis represents time, as we train the network
- The y-axis represents the magnitude of activations.
- More yellow means more activations at a given magnitude.
- More blue means fewer activations at a given magnitude.
Visualizing the activations:
The first layer doesn’t appear to change much as training progresses. This is likely because we’re using such a small learning rate.
If we instead use:
We can plot our activations again:
This looks much better! It seems like it will improve things as my f1score improved from
0.468753 with a corresponding improvement in loss.