Custom Head?

Hi everyone, I’ve got a model working with Learner() and I want to start using cnn_learner so I can get the differential learning rates, etc. I have three networks built ontop of resnet and the architecture is like so:






I want to just take everything after that avgpool layer and use it as a head. Do I need to modify the net so that prop, c, and p come immediately after and are not in the parent group?

I think my split function needs to look like this:

Can I ask what does prop, c and p mean here?

They are seperate outer layers of the network. You can assume linear layers for simplicity

1 Like

I don’t really understand your notation here. But if you want to use split with Leaner(), you can try the following (assume you have properly cut the head of resnet50)

self.body = resnet50()
self.head = nn.Sequtial( ... prop,c,p?)

then you can do
learn.split(lambda m: (m.body[6], m.head))

The model is now cut to 3 parts, with resnet50 split at block 6, 6 to end, and head
You can also check using learn.layer_groups

1 Like

Thanks! Yeah I did figure this out but when I split it between the pretrained resnet groups and my other output I got a size mismatch error, whereas when I don’t split it works fine. Do you know why this could be?

You meant your body and head has mismatch filter?

Well if you are using Resnet50 as backbone, there is handy function provided by fastai. Check create_body(), it will cut the network for you.

And when you stick your head, for the ni field, fastai has another handy function, called num_feature_model(), so it will grab the output of your body, so you don’t have to worry about it.

The flow will be

self.body = create_body(models.Resnet50)
self.head = Your HEAD (ni = num_feature_model(self.body),nf=whatever)

Hope this helps :slight_smile:

1 Like

Yes, and I looked at the source code and recreated what it does, but I got that error. You can see what I did in the link. :confused:

can you do Linear(num_feature_model(Adaptiveavgpool(),out=2048…) etc…

Would that work?

I know that AdaptiveConcatPooling needs to be num_feature_model() * 2, since you put avg and max pool together,therefore channel size is up by 2 times, but I never tried just average pool. From what you described, you defined ni for Linear layer to be 2048, but it is 80xx.

I don’t think it is the split problem, what you can check is call learn.summary() before split, see if it works

1 Like

Yeah, it’s how their architecture works. It’s like R-CNN in a way (since we have bounding boxes generated by the model). I will try that momentarily.

Any reason why that original layer size would work without the split?

Not really, as you can see, cut just split your model into different nn.Sequential(), so you can apply different lr in the param_groups.

If your original model doesn’t have the problem, then your split model shouldn’t have any problem. (unless you cut the model into more than 2 pieces and you had a gap, so you are missing one layer)

If you are using jupyter notebook, can you reconnect the kernel and run the cells in order, see if your original model (before cut) doesn’t have the problem.

1 Like

Got it. I’ll look at that and make sure nothing hidden gets declared in the model definition too. Thank you very much!

We’re off to the races! That’s what it was! There was a hidden padding layer that was being skipped

It happens a lot using jupyter notebook. Use a dev notebook and merge cells to the front in case you need to reboot the kernel… :slightly_smiling_face:

Yesterday I debugged 4 hours for pytorch complaining about some backpop issue with tensor not in place…

It turns out I tab to a wrong thing… so dev in jupyter notebook does have lots of advantage, but out of order execution is really killing me :frowning:

Yeah seriously haha… It looks like one library’s version of the model had a specific padding layer inserted into the model’s summary, whereas another one didn’t. This was the issue. Two different repos, two ‘different’ models. The original was actually missing an entire layer in its definition! I’m hoping to do a PR here in a bit. I want to run a few tests to see how best to use it, but it works with a variety of bases (xresnet, etc)

(PR and a doc with recommendations)

1 Like