Size Mismatch after Split()?

Hey all, sorry again for another question, I’m debugging my implementation of NTS-Net so it’ll work through cnn_learner(kinda). Here’s the gist: after going through the source code for creating the head and base of the model, I’ve done the following:

net = attention_net()
learn = Learner(data, net, loss_func=total_loss, metrics=metric)
new_state_dict = torch.load(Path('model.ckpt'))['net_state_dict'] ### I can load it into learner

The above achieved loading their pre-trained model. I could not just grab the resnet weights as there is more than that I need, specifically a few layers in the head.

learn_state_dict = learn.model.state_dict()
for name, param in learn_state_dict.items():
  if name in new_state_dict:
    input_param = new_state_dict[name]
    if input_param.shape == param.shape:
      param.copy_(input_param)
    else:
      print('Shape mismatch')
learn.model.load_state_dict(learn_state_dict)

That just got my model ready to copy the weights over. From here I can do training, etc and use their weights. Since they are instantiated with the original weights, I wanted to split it from here.

def nts_body(m:nn.Module):
    return nn.Sequential(*list(m.pretrained_model.children())[:8])

body = nts_body_cut(learn.model)
h1 = list(learn.model.pretrained_model.children())[8:]
h1[1] = nn.Linear(2048, data.c)
cn = nn.Linear(10240, data.c)
prt = nn.Linear(2048, data.c)
prp = list(learn.model.proposal_net.children())

Head generation is a bit disgusting at the moment:

head = nn.Sequential((*list(learn.model.pretrained_model.children())[8:]), *list(learn.model.proposal_net.children()), cn, prt)

But you can essentially see what I do here. Then I load it into a sequential,
model = nn.Sequential(body, head)

Make a learner:

learn = Learner(data, model, loss_func=mytotal_loss, metrics=metric)

Then split on where those last two layers we usually cut off for transfer learning on the resnet lie:

def _nts_cut(m:nn.Module)->List[nn.Module]:
    return (m[0], m[1])
learn.split(split_on=_nts_cut);

Here is a small gist of the end of the model:

      )
      (2): Bottleneck(
        (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
      )
    )
  )
  (1): Sequential(
    (0): AdaptiveAvgPool2d(output_size=1)
    (1): Linear(in_features=2048, out_features=200, bias=True)
    (2): Conv2d(2048, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (5): ReLU()
    (6): Conv2d(128, 6, kernel_size=(1, 1), stride=(1, 1))
    (7): Conv2d(128, 6, kernel_size=(1, 1), stride=(1, 1))
    (8): Conv2d(128, 9, kernel_size=(1, 1), stride=(1, 1))
    (9): Linear(in_features=10240, out_features=200, bias=True)
    (10): Linear(in_features=2048, out_features=200, bias=True)
  )
)

When I view the layer groups, everything looks in place, but when I do an lr_find() it says that there is a size mismatch, m1: [8192 x 1], m2: [2048 x 200] on the linear layer after that AdaptiveAvgPool. Why is that? This only happens when I try to split those layer groups, standard non-fit_one_cycle-enabled works fine. And I cannot just flatten this as I am keeping track of 4 different computations the entire time.

Thank you for any help,

Zach

Solved. Forgot a layer hidden in declaration.