The lecture says - “we will use different learning rates for different layers: the first few layers will be at
1e-4 , the middle layers at
1e-3 , and our Fully Connected (FC) layers we’ll leave at
1e-2 as before.”
If I am implementing a Resnet152 what exactly will be first few and middle layers? Also, if we were to implement this in PyTorch how would we do it?