Modifying pretrained resnet - does it "just work"?

Pomo · February 27, 2019, 2:01am

Hello PyTorch experts. I understand that PyTorch is able to track and update weights and gradients automatically. However, because I do not understand exactly how it does this magic I am also not confident I won’t break it.

I would like to modify an existing Resnet model, and have tried to do so by imitating the code in fastai. Would some please check my code?

The task:

create a pretrained model with create_cnn
the custom head created by fastai looks like this…
(1): Sequential(
(0): AdaptiveConcatPool2d(
(ap): AdaptiveAvgPool2d(output_size=1)
(mp): AdaptiveMaxPool2d(output_size=1)
)
(1): Flatten()
(2): BatchNorm1d(4096, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Dropout(p=0.25)
(4): Linear(in_features=4096, out_features=512, bias=True)
(5): ReLU(inplace)
(6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): Dropout(p=0.5)
(8): Linear(in_features=512, out_features=4, bias=True)
)
insert my own layer (module End) before layer 3, and replace layer 4 with a Linear layer that receives a different number of features. Place another module, Start, in front of everything else.
assign this new model to the existing learn object.
have everything just work. Clear?

The code:

learn = create_cnn(data, arch, metrics=[accCancer])
RNmodel = learn.model
head = RNmodel[1]

myLinear = nn.Linear(in_features=4104, out_features=512, bias=True) 
head = nn.Sequential(*list(head.children())[:3], End(), nn.Dropout(.25), myLinear, *list(head.children())[5:])

model = nn.Sequential(Start(),RNmodel[0],head).cuda()
learn.model = model

The resulting model head looks right and the model appears to train.

(2): Sequential(
    (0): AdaptiveConcatPool2d(
      (ap): AdaptiveAvgPool2d(output_size=1)
      (mp): AdaptiveMaxPool2d(output_size=1)
    )
    (1): Flatten()
    (2): BatchNorm1d(4096, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): End()
    (4): Dropout(p=0.25)
    (5): Linear(in_features=4104, out_features=512, bias=True)
    (6): ReLU(inplace)
    (7): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): Dropout(p=0.5)
    (9): Linear(in_features=512, out_features=4, bias=True)
  )

But does it all automagically work right without my doing anything more? I am concerned about the mentions of “registering/initializing parameters” in the PyTorch docs.

Thanks for reading my long question!

Pomo · April 2, 2019, 1:34am

Coming back to answer this a month later, after tracing through many modified models.

Yes, it all just works…

grabbing a module (usually pre-trained) by its index.
listfying a module’s children and slicing
throwing any of these into Sequential
assigning a different layer to a model with indexing on the left, as in RNModel[1][3] = myLinear
assigning to a Module’s named instance variables, as in myModule.lin2 = myLinear. Here you have to make sure that myModule.forward() still works.

PyTorch properly keeps track of parameters and applies optimizers correctly.

You can also dynamically change during training a layer’s fixed parameters, like requires_grad and dropout probability.

I had to test it to believe it.

Pomo · January 31, 2020, 10:27pm

I need to add a caution here. Yes, it all just works in PyTorch but maybe not in fastai v2.

The potential problem is that when fastai trains, it uses the layers specified in layer_groups to tell the optimizer which parameters to optimize. layer_groups is set when the Learner is created. If you later add new weights to the model, the optimizer will not know about them. So you may need to update layer_groups whenever you alter a model already inside a learner.

I am not sure this problem is real, and have not had the time to trace the relevant code.

For background, please see
https://forums.fast.ai/t/replicating-fastai-results-manually/61209

shimsan · February 22, 2020, 12:19am

Excuse my naivete, but why not modify the model outside and then create the learner?

Pomo · February 22, 2020, 2:09am

It’s a good question, and I may not exactly recall motivations from a year ago. I think it seemed safer and simpler to start with a Learner that clearly works and make incremental, verifiable changes to it. Especially without confidence that I understood everything that the fastai library was doing and why.

But yes you could copy the code for create_cnn (now called cnn_learner), and insert custom code to adjust the model before it creates the Learner. Just be sure to note (in case not obvious) that cnn_learner does more than instantiate a Learner (it also does split, freeze, and init), and that if its code is updated in the library, those changes will not propagate into one’s custom version.

My stylistic preference, whether justified or not, is to rework existing “expert methods” as little as possible.

Cheers,
Malcolm

jeremy · February 22, 2020, 4:25am

Note that learn.create_opt will update the optimizer with changes to the param groups.

Pomo · February 22, 2020, 9:10pm

You still have to correctly set layer_groups for the modified model, right?

nchukaobah · February 9, 2021, 9:29pm

Did you clarify if there is a need to specify layers? I’m facing this issue where I want to modify a fastai resnet, slightly

Pomo · February 9, 2021, 10:42pm

You mean the layer_groups? I never got a definite response. My best guess is that you will need to update layer_groups to reflect any Modules that are added or deleted. If you only change the arguments to a layer, probably not. But you would need to trace the fastai code to be sure.

Those layer_groups must ultimately be sent to the optimizer with learn.create_opt, as Jeremy points out.

nchukaobah · February 10, 2021, 2:01am

Thanks. I’ll try and see what happens.