How change layers pre-trained model wihout using Learner

mjack3 · May 24, 2019, 7:27am

Hi all, im new for here and i completed all the course, i think it’s amazing.

Right now im trying to load a pre-trained resnet but i need change last layer for predict 2 classes without using class learner(dataBunch, model metric). Im just trying to understand a bit more how the code works. In keras for instance, i could do some like model = keras.VGG16() and change the layers in a loop. Could i do something similar in fastai? or maybe using others functions automatically?

Thank you!

ste · May 24, 2019, 7:58am

Take a look at this example: here I’m changing the first layer to accomodate 4 channel input, but the kind of modifications you’ve to do is the same.

# Tweak Resnet to support multiple channels
nChannels = 4
learn.model[0][0]=nn.Conv2d(nChannels,64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
learn.model.cuda();

NOTE: LAYER GROUPS
learn.model[GROUP_ID][LAYER_IN_GROUP_ID]

GROUP_ID: id of the group. Layer groups are used to “manage” multiple layers at once (ie: freezing or lr tweaking). The number of groups depends on the kind of model you’re using (AFAIR resnet34 has 3 groups). For your problem you’ve to work in the last group.
LAYER_IN_GROUP_ID: id of layer inside the group.

slawekbiel · May 24, 2019, 8:14am

You can printout the model you have to see its structure, and then reuse any parts of it, by for example sticking them in a new Sequential module.

model = torchvision.models.vgg16(pretrained = True)
model
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace)
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace)
    (5): Dropout(p=0.5)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

Now I create a new model by taking only fist two conv layers and attaching a simple linear layer to it:

my_new_model = nn.Sequential(
    model.features[:4],
    PoolFlatten(),
    nn.Linear(64, 4)
)
x = torch.rand((1,3,224,224))
my_new_model(x)
tensor([[-0.2058, -0.0773, -0.4127,  0.6589]], grad_fn=<AddmmBackward>)

mjack3 · May 24, 2019, 8:28am

i forgot to mention, that actually i don’t want to use dataBunch or something automatically like that, i would like customize it by myself

slawekbiel · May 24, 2019, 8:44am

All I pasted above is pure pytorch (other than PoolFlatten but that was just for a simple illustration)

mjack3 · May 24, 2019, 8:47am

yes! Right now im trying make something similar, let’s see =D

ste · May 24, 2019, 8:48am

Exactly, fast.ai model is a pytorch model

If you try to use your model (remember that under the hood is a standard pytorch model ) without fast.ai, you need to normalize your input manually (and probably reshape it to"size").

Eventually you’ll probably need that…

itaishch · June 27, 2019, 1:47pm

I have a similar need, but your fix doesn’t seem to work for a toy example I created since it affects the latter stage’s expected number of channels (using for segmentation model).

“RuntimeError: Given groups=1, weight of size [99, 99, 3, 3], expected input[1, 100, 40, 40] to have 99 channels, but got 100 channels instead”.

It also doesn’t update the learner’s ‘layer_groups’ field which might cause some additional issues (but that’s not causing the failure)

Editing the source file of torchvision’s ResNet (under /root/anaconda3/envs/fastai/lib/python3.7/site-packages/torchvision/models) is my current terrible but quick workaround, would love to hear a better approach

ste · June 27, 2019, 9:27pm

Can you share some more details about your toy example?

itaishch · June 29, 2019, 1:53pm

Sure (and thanks ), Notice it is possibly due to me using unet / not latest (1.0.45), so not necessarily related to proposed fix…
For background, I’m treating stacked np arrays as images (in this case, 4 of them).
allowed load_image method to handle np arrays, and then created DataBunch:

Proceeding to creating the model

Afterwards, using your lines ended in the error I mentioned of the mismatch in this layer, which now has these parameters:

using your lines created
“RuntimeError: Given groups=1, weight of size [99, 99, 3, 3], expected input[1, 100, 40, 40] to have 99 channels, but got 100 channels instead”.

ste · June 29, 2019, 5:12pm

You’re welcome!

Can you share them? Is that the only thing you changed to the model?

itaishch · June 30, 2019, 7:04am

Yes, and yes:

Then I would get the error when using lr_find(learn).
The change I’ve done to the torchvision model is exactly the same BTW - debugging the learner creation will show why there’s a different behavior, but can’t get around to it yet.

aksg87 · August 23, 2019, 1:52pm

Hello @itaishch

I am trying to modify inputs and outputs of a pretrained resnet model as well. How did everything go?

itaishch · August 24, 2019, 10:35am

Sadly I dropped it since that discussion, had to prioritize some engineering rather than research…
If I’ll pick it up again and make additional progress I’ll update, but for now I recommend to start from ste’s suggestions and go from there. Would also advise to dive into part 2 of the course if you haven’t already, I’m still midway through it but it allows more control in the customization process (i.e. better understanding of both PyTorch’s and fast.ai’s behavior)

Wesley · August 26, 2019, 1:06pm

I’m having sort of the same question;

How can I go from a saved model on n classes to n+2 classes? I’ve trained a model using the cnn_learner() and saved it.

If I create a new ImageDatabunch having +2 extra classes and a new cnn_learner instance and try to load the model I get the following error:

Error(s) in loading state_dict for Sequential:
size mismatch for 1.8.weight: copying a param with shape torch.Size([12, 512]) from checkpoint, the shape in current model is torch.Size([14, 512]).
size mismatch for 1.8.bias: copying a param with shape torch.Size([12]) from checkpoint, the shape in current model is torch.Size([14]).

So what is going wrong is that it tries to load a model with 12 outputs into a layer having 14 outputs.

muellerzr · August 26, 2019, 1:21pm

Essentially you copy the state dict for everything but the output classes (and input if that’s what you want) Here is one that I used: Loading pretrained weights that are not from ImageNet

def load_diff_pretrained(learn, name:Union[Path,str], device:torch.device=None):
    "Load model `name` from `self.model_dir` using `device`, defaulting to `self.data.device`."
    if device is None: device = learn.data.device
    if (learn.model_dir/name).with_suffix('.pth').exists(): model_path = (learn.model_dir/name).with_suffix('.pth')
    else: model_path = name
    new_state_dict = torch.load(model_path, map_location=device)
    learn_state_dict = learn.model.state_dict()
    for name, param in learn_state_dict.items():
        if name in new_state_dict:
            input_param = new_state_dict[name]
            if input_param.shape == param.shape:
                param.copy_(input_param)
            else:
                print('Shape mismatch at:', name, 'skipping')
        else:
            print(f'{name} weight of the model not in pretrained weights')
    learn.model.load_state_dict(learn_state_dict)