I recently trained a resnet34 model on a set of images from plant leaves, which where taken with a white sheet as background. The model performed pretty well, detailed info about my approach can be found in Share your work here.
Now I want to apply my “pretrained” model (weigths are saved with learn.save()) to another subset of the data, but I am running into a problem here. The new subset has only 51 classes, not 124 as my previous one. This results in following error:
As I understood @jeremy in lesson 5 fastai throws away the last weight matrix when loading a pretrained model with create_cnn(). How can I achieve the same behaviour with learn.load() ?
Maybe someone more knowledgeable can offer a better way, but what I would try is force the new model to also have 124 outputs (the same architecture as the old one). You’ll then need to modify a bit your labels to reflect that.
For that you have to specify a custom head when calling create_cnn(). It’s more advanced than what we’ve coverered in the course right now, but you can take a look at the documentation and the source code to understand how to do this.
I think that you for your custom head you can simply call create_head with the same parameters as create_cnn when it calls it, except for the number of classes (nc) you replace data.c by what you want (124 if I understood correctly).
What’s actually happening here? If the custom head has 124 outputs but the data only has 51 classes, is it just using 51 of the 124 outputs and setting the others to zero? Does that mean it wouldn’t work if the loaded model had fewer classes than the data?
Yes basically that. He had a pretrained model on 124 classes thus 124 outputs. So for him to load the pretrained weight, the last layer (a fully connected one, mapping whatever was before to 124 ouputs) had to be exactly the same.
When training or doing inference with his subset of 51 classes it’s not so much that the other 124 - 51 = 73 outputs are set to 0 rather than they are not activated. But the model could still missclassify one of the 51 classes as one of the other 73 classes (that would be way more likely before he fine-tuned the model to the subset).
If the first model had fewer classes than what you wanted for the second one it wouldn’t be possible indeed. Maybe what you could do is still load the weights, and then delete the last fully connected layer along with the pretrained weights and put another one that maps to the correct number of classes. It would probably need more additionnal training than what hkristen did however.
I’m not really sure about what you are trying to achieve here. Adding a custom head ? Or a full custom architecture ? If it’s the latter create_cnn will not be useful as its purpose is to give you a full architecture, that you can customize with a custom head