How change layers pre-trained model wihout using Learner

im new for here and i completed all the course, i think it's amazing.

Right now im trying to load a pre-trained resnet but i need change last layer for predict 2 classes without using class learner(dataBunch, model metric). Im just trying to understand a bit more how the code works. In keras for instance, i could do some like model = keras.VGG16() and change the layers in a loop. Could i do something similar in fastai? or maybe using others functions automatically?

Thank you!

Take a look at this example: here I’m changing the first layer to accomodate 4 channel input, but the kind of modifications you’ve to do is the same.

# Tweak Resnet to support multiple channels
nChannels = 4
learn.model[0][0]=nn.Conv2d(nChannels,64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)


  • GROUP_ID: id of the group. Layer groups are used to “manage” multiple layers at once (ie: freezing or lr tweaking). The number of groups depends on the kind of model you’re using (AFAIR resnet34 has 3 groups). For your problem you’ve to work in the last group.
  • LAYER_IN_GROUP_ID: id of layer inside the group.

You can printout the model you have to see its structure, and then reuse any parts of it, by for example sticking them in a new Sequential module.

model = torchvision.models.vgg16(pretrained = True)
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace)
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace)
    (5): Dropout(p=0.5)
    (6): Linear(in_features=4096, out_features=1000, bias=True)

Now I create a new model by taking only fist two conv layers and attaching a simple linear layer to it:

my_new_model = nn.Sequential(
    nn.Linear(64, 4)
x = torch.rand((1,3,224,224))
tensor([[-0.2058, -0.0773, -0.4127,  0.6589]], grad_fn=<AddmmBackward>)

i forgot to mention, that actually i don’t want to use dataBunch or something automatically like that, i would like customize it by myself :slight_smile:

All I pasted above is pure pytorch (other than PoolFlatten but that was just for a simple illustration)


yes! Right now im trying make something similar, let’s see =D

Exactly, model is a pytorch model :wink:

If you try to use your model (remember that under the hood is a standard pytorch model ) without, you need to normalize your input manually (and probably reshape it to"size").

Eventually you’ll probably need that…

I have a similar need, but your fix doesn’t seem to work for a toy example I created since it affects the latter stage’s expected number of channels (using for segmentation model).

“RuntimeError: Given groups=1, weight of size [99, 99, 3, 3], expected input[1, 100, 40, 40] to have 99 channels, but got 100 channels instead”.

It also doesn’t update the learner’s ‘layer_groups’ field which might cause some additional issues (but that’s not causing the failure)

Editing the source file of torchvision’s ResNet (under /root/anaconda3/envs/fastai/lib/python3.7/site-packages/torchvision/models) is my current terrible but quick workaround, would love to hear a better approach

Can you share some more details about your toy example?

Sure (and thanks :slight_smile:), Notice it is possibly due to me using unet / not latest (1.0.45), so not necessarily related to proposed fix…
For background, I’m treating stacked np arrays as images (in this case, 4 of them).
allowed load_image method to handle np arrays, and then created DataBunch:

Proceeding to creating the model

Afterwards, using your lines ended in the error I mentioned of the mismatch in this layer, which now has these parameters:

using your lines created
“RuntimeError: Given groups=1, weight of size [99, 99, 3, 3], expected input[1, 100, 40, 40] to have 99 channels, but got 100 channels instead”.

You’re welcome!

Can you share them? Is that the only thing you changed to the model?

Yes, and yes:

Then I would get the error when using lr_find(learn).
The change I’ve done to the torchvision model is exactly the same BTW - debugging the learner creation will show why there’s a different behavior, but can’t get around to it yet.

Hello @itaishch

I am trying to modify inputs and outputs of a pretrained resnet model as well. How did everything go?

Sadly I dropped it since that discussion, had to prioritize some engineering rather than research…
If I’ll pick it up again and make additional progress I’ll update, but for now I recommend to start from ste’s suggestions and go from there. Would also advise to dive into part 2 of the course if you haven’t already, I’m still midway through it but it allows more control in the customization process (i.e. better understanding of both PyTorch’s and’s behavior)

I’m having sort of the same question;

How can I go from a saved model on n classes to n+2 classes? I’ve trained a model using the cnn_learner() and saved it.

If I create a new ImageDatabunch having +2 extra classes and a new cnn_learner instance and try to load the model I get the following error:

Error(s) in loading state_dict for Sequential:
size mismatch for 1.8.weight: copying a param with shape torch.Size([12, 512]) from checkpoint, the shape in current model is torch.Size([14, 512]).
size mismatch for 1.8.bias: copying a param with shape torch.Size([12]) from checkpoint, the shape in current model is torch.Size([14]).

So what is going wrong is that it tries to load a model with 12 outputs into a layer having 14 outputs.

Essentially you copy the state dict for everything but the output classes (and input if that’s what you want) Here is one that I used: Loading pretrained weights that are not from ImageNet

def load_diff_pretrained(learn, name:Union[Path,str], device:torch.device=None):
    "Load model `name` from `self.model_dir` using `device`, defaulting to ``."
    if device is None: device =
    if (learn.model_dir/name).with_suffix('.pth').exists(): model_path = (learn.model_dir/name).with_suffix('.pth')
    else: model_path = name
    new_state_dict = torch.load(model_path, map_location=device)
    learn_state_dict = learn.model.state_dict()
    for name, param in learn_state_dict.items():
        if name in new_state_dict:
            input_param = new_state_dict[name]
            if input_param.shape == param.shape:
                print('Shape mismatch at:', name, 'skipping')
            print(f'{name} weight of the model not in pretrained weights')
