Differential learning rate

arjanhada · December 16, 2018, 4:29pm

The lecture says - “we will use different learning rates for different layers: the first few layers will be at 1e-4 , the middle layers at 1e-3 , and our Fully Connected (FC) layers we’ll leave at 1e-2 as before.”

If I am implementing a Resnet152 what exactly will be first few and middle layers? Also, if we were to implement this in PyTorch how would we do it?