The lecture says - “we will use different learning rates for different layers: the first few layers will be at 1e-4
, the middle layers at 1e-3
, and our Fully Connected (FC) layers we’ll leave at 1e-2
as before.”
If I am implementing a Resnet152 what exactly will be first few and middle layers? Also, if we were to implement this in PyTorch how would we do it?