@jeremy chose satellite imgs just to point out that kind of qualitative difference.
When you trained the earliest layers of Resnet over your dogs/cats dataset, you did set TWO orders of magnitude difference with respect to the last two layers (the ones added by us). That was because you did NOT want to spoil those layers’ weights: dogs and cats are very similar to Imagenet’s images over which they were laboriously tuned over.
The same holds, to a lesser extent, for the middle layers (one order of magnitude): the ones that recognize slightly more complex patterns.
Now, you got to unfreeze and train earlier and middle layers over images that are more qualitatively different from imagenet’s images, so you got perturb them a lot more.
Indeed, if lr = 10^-2 is the learning rate of late layers, lr/9 is a lot MORE than lr * 10^-2 ( = 10^-4), and the same stands for 10^-3 vs lr/3 (that is, 10^-2 * (1/3) ).
The gist is that the more you have to cope with images qualitatively different from those used for pretraining, the more you have to be strong on trying higher learning rates.
Let us know whether this helps