Creating Two Models With A Common Feature Extractor

rsomani95 · March 11, 2020, 1:11pm

I’m trying to create two models which have the exact same feature extractor.

First, I train one model on a dataset, then load the “body” of this model after initialising the second model, which will train on a different dataset.

The “body” of the second one is a clone of the first, and must be frozen and train_bn turned off to ensure it remains the same. However, this seems impossible to do.

The code looks something like:

learn1 = cnn_learner(data1, arch)
learn1.fit(3)
learn1.unfreeze()
learn1.fit(3)

import copy
feat_extractor = copy.deepcopy(learn1.model[0])

learn2 = cnn_learner(data2, arch, train_bn=False)

# neither of these two approaches work
learn2.model[0] = feature_extractor
learn2.model[0] = copy.deepcopy(feature_extractor)

learn2.fit(1)

x = torch.rand(1,3,224,224)
learn2.model[0].eval();
feature_extractor.eval();

torch.equal(feature_extractor(x), learn2.model[0](x))
# = FALSE

I’ve created a fully reproducible notebook showing the same, in some more depth, showing in which exact layers there is a mismatch:

rsomani95 · March 11, 2020, 7:44pm

I found the main issue – BatchNorm. I could fix it using a built-in callback called BnFreeze (link).

Another key change was to copy the model weights differently.

# old method -- wrong
learn2.model[0] = copy.deepcopy(feature_extractor)
# replace with THIS:
learn2.model[0].load_state_dict(feature_extractor.state_dict())

Interestingly, training learn2 in fp16 and converting back to fp32 led to a mismatch