Transfer Learning with pretrained fastai model

hkristen · November 21, 2018, 10:57am

Hi everbody!

I recently trained a resnet34 model on a set of images from plant leaves, which where taken with a white sheet as background. The model performed pretty well, detailed info about my approach can be found in Share your work here.

Now I want to apply my “pretrained” model (weigths are saved with learn.save()) to another subset of the data, but I am running into a problem here. The new subset has only 51 classes, not 124 as my previous one. This results in following error:

As I understood @jeremy in lesson 5 fastai throws away the last weight matrix when loading a pretrained model with create_cnn(). How can I achieve the same behaviour with learn.load() ?

See the gist of the notebook I used…

Cheers,
Harry

PierreO · November 21, 2018, 2:14pm

Maybe someone more knowledgeable can offer a better way, but what I would try is force the new model to also have 124 outputs (the same architecture as the old one). You’ll then need to modify a bit your labels to reflect that.
For that you have to specify a custom head when calling create_cnn(). It’s more advanced than what we’ve coverered in the course right now, but you can take a look at the documentation and the source code to understand how to do this.

I think that you for your custom head you can simply call create_head with the same parameters as create_cnn when it calls it, except for the number of classes (nc) you replace data.c by what you want (124 if I understood correctly).

hkristen · November 22, 2018, 11:06am

Thank you @PierreO for pointing me in the right direction, I tried what you suggested and it worked perfectly

This is how I created the custom_head

from fastai.vision.learner import create_head, cnn_config, num_features_model
arch = models.resnet34
cut = None
meta = cnn_config(arch)
body = create_body(arch(pretrained=True), ifnone(cut,meta['cut']))

#define number of classes of the "old" model
nc = 124
nf = num_features_model(body) * 2
ps=0.5
lin_ftrs = None

head_124 = create_head(nf, nc, lin_ftrs, ps)

And it just worked

The I created the cnn with the custom head and was able to load my pretrained model

learn = create_cnn(data,models.resnet34, metrics=error_rate,  custom_head=head_124)

See the updated gist on more info about further training

PierreO · November 22, 2018, 11:43am

Great job ! Congrats on making this work

tamlyn · November 22, 2018, 8:00pm

What’s actually happening here? If the custom head has 124 outputs but the data only has 51 classes, is it just using 51 of the 124 outputs and setting the others to zero? Does that mean it wouldn’t work if the loaded model had fewer classes than the data?

bluesky314 · November 22, 2018, 8:34pm

The last fc layer is entirely replaced not followed up by one.

PierreO · November 23, 2018, 10:50am

Yes basically that. He had a pretrained model on 124 classes thus 124 outputs. So for him to load the pretrained weight, the last layer (a fully connected one, mapping whatever was before to 124 ouputs) had to be exactly the same.
When training or doing inference with his subset of 51 classes it’s not so much that the other 124 - 51 = 73 outputs are set to 0 rather than they are not activated. But the model could still missclassify one of the 51 classes as one of the other 73 classes (that would be way more likely before he fine-tuned the model to the subset).

If the first model had fewer classes than what you wanted for the second one it wouldn’t be possible indeed. Maybe what you could do is still load the weights, and then delete the last fully connected layer along with the pretrained weights and put another one that maps to the correct number of classes. It would probably need more additionnal training than what hkristen did however.

bluesky314 · November 23, 2018, 7:03pm

Can you explain your code a bit? why num_features_model(body) * 2 ? Also what does ifnone(cut,meta[‘cut’] mean? When I remove it code stops working.

Also create_head automatically creates those adaptive layers, convs, etc? You only specified input and output feats

PierreO · November 23, 2018, 7:13pm

Those first lines are copy paste of the first line of create_cnn. You can look at the source code to understand it better, but basically :

The meta variable stores when to cut the body and put the head (here it’s just gonna cut at a standard layer for resnets models)
the body variable stores all the resnet body
nc is the number of ouputs he wants
nf is the number of outputs of the body (create_head needs to know that in order for the first layer of the head to have the right size)
ps is the dropout rate of the dropout layers in the head (or at least the last one, the previous dropout layers are actually set at ps/2)
lin_ftrs is the size of any additional block of layers you want in the head, in this case it will go to default
and finally taking those parameters and actually creating the layers is create_head, and you can then pass the output to create_cnn as a custom head

I hope that was clear !

bluesky314 · November 23, 2018, 7:13pm

I am trying to pass in my completely custom cnn. I do:

my_head=mymodel().cuda()
learn = create_cnn(data, my_head, metrics=error_rate)

But i get: TypeError: conv2d(): argument ‘input’ (position 1) must be Tensor, not bool

How can I use my architecture? @PierreO

PierreO · November 23, 2018, 7:16pm

I’m not really sure about what you are trying to achieve here. Adding a custom head ? Or a full custom architecture ? If it’s the latter create_cnn will not be useful as its purpose is to give you a full architecture, that you can customize with a custom head