Parameter splitter: Why are ResNets and other models split in 3 groups (instead of just head and body)

jc-denton · October 8, 2020, 12:30pm

I had a look at the Siamese tutorial because I need to create a custom model for multi task learning.

The splitting function there is self explanatory:

def siamese_splitter(model):
    return [params(model.encoder), params(model.head)]

However I also checked what happens to a ResNet by default. In the 21_vision.learner.ipynb notebook I found:

def _resnet_split(m): return L(m[0][:6], m[0][6:], m[1:]).map(params)

Here the body is also split into two parts, I could not find an explanation of why this is. Is it because the first part of the encoder needs adaption because of different input sizes? Are there use cases where it makes sense to split the parameters into more than 3 parts? In 12_optimizer.ipynb freeze() (from BaseOptimizer) freezes everything except the last list of parameters:

def freeze(self):
    assert(len(self.param_lists)>1)
    self.freeze_to(-1)

darek.kleczek · October 9, 2020, 3:16am

Just a guess - could this be because of discriminative learning rates? E.g. when you pass slice(1e-4,1e-2) into your fit method?

jc-denton · October 9, 2020, 9:53am

Thanks for your quick answer! It looks like this is the case. So splitters exist for two reasons:

Tell the learner which layers to freeze during transfer learning.
Tell the learner where to apply which learning rate (if discriminative rates are used).