When we first unfreeze the model and train for one cycle with learning rate = 0.003, we see that the result gets worse. We then load the saved model, and run the Learning Rate Finder from which we can estimate that 1e-6 is a good learning rate. Note: The Learning Rate Finder was ran on the frozen model, and as a result , I feel that the optimal learning rate we get from the Finder should be the optimal Learning rate of the LAST layers.
However, in the following cells, we see that that the unfreezed model is trained with a slice of LR varying from 1e-6 to 1e-4, where 1e-6 is the learning rate for the first layer and 1e-4 is the learning rate for the last layer. In order to train the unfreezed model, shouldn’t we have ran the Finder on the unfrozen model itself, and then use the new optimal Learning Rate?