Hi everyone, I’ve got a model working with Learner() and I want to start using cnn_learner so I can get the differential learning rates, etc. I have three networks built ontop of resnet and the architecture is like so:
resnet50():
(avgpool)
(fc)
prop():
(l1)
(l2)
(l3)
c():
p():
I want to just take everything after that avgpool layer and use it as a head. Do I need to modify the net so that prop, c, and p come immediately after and are not in the parent group?
I don’t really understand your notation here. But if you want to use split with Leaner(), you can try the following (assume you have properly cut the head of resnet50)
Thanks! Yeah I did figure this out but when I split it between the pretrained resnet groups and my other output I got a size mismatch error, whereas when I don’t split it works fine. Do you know why this could be?
Well if you are using Resnet50 as backbone, there is handy function provided by fastai. Check create_body(), it will cut the network for you.
And when you stick your head, for the ni field, fastai has another handy function, called num_feature_model(), so it will grab the output of your body, so you don’t have to worry about it.
The flow will be
self.body = create_body(models.Resnet50)
self.head = Your HEAD (ni = num_feature_model(self.body),nf=whatever)
can you do Linear(num_feature_model(Adaptiveavgpool(),out=2048…) etc…
Would that work?
I know that AdaptiveConcatPooling needs to be num_feature_model() * 2, since you put avg and max pool together,therefore channel size is up by 2 times, but I never tried just average pool. From what you described, you defined ni for Linear layer to be 2048, but it is 80xx.
I don’t think it is the split problem, what you can check is call learn.summary() before split, see if it works
Not really, as you can see, cut just split your model into different nn.Sequential(), so you can apply different lr in the param_groups.
If your original model doesn’t have the problem, then your split model shouldn’t have any problem. (unless you cut the model into more than 2 pieces and you had a gap, so you are missing one layer)
If you are using jupyter notebook, can you reconnect the kernel and run the cells in order, see if your original model (before cut) doesn’t have the problem.
Yeah seriously haha… It looks like one library’s version of the model had a specific padding layer inserted into the model’s summary, whereas another one didn’t. This was the issue. Two different repos, two ‘different’ models. The original was actually missing an entire layer in its definition! I’m hoping to do a PR here in a bit. I want to run a few tests to see how best to use it, but it works with a variety of bases (xresnet, etc)