New custom head in trained model

pete1 · December 22, 2018, 9:24pm

I have trained a multilabel/multiclass model using pretrained resnet34 weights, using data with 28 classes. I would now like to use the weights of this model, minus the head, to train a BINARY classifier using the same data but with only one of the labels. The new classifier uses the exact same input images and the model will need only small changes, all in and near the head.

Lesson 9 gave me the ability to drop a new custom head into the resnet34 pretrained model. But it involves a call:

    ConvLearner.pretrained(f_model, md, custom_head=head_reg4)

which requires a standard argument (e.g. resnet34 function, which is in the meta table) in the first argument (I think), and not, for example, simply a path to my transfer-trained model.

So I do not know how to change the head on the model I have trained (it was saved in a prior run and I will read it in using, I guess, learner.load()).

One possible way is to:

create a new model with the new custom head and original resnet34 weights
read in the model that I trained and copy its weights into the new model

Step 1 is easy. What is the best way to accomplish step 2?
Am I thinking about this the right way?
Finally: is there documentation that would answer this question?

tcapelle · December 30, 2018, 9:42pm

I would also love some help with this.

sgugger · December 31, 2018, 7:37am

You have to dig a bit into the code base for this. The model created by create_cnn is a sequential model with two things: the body and the head. If you want to keep the pretrained body, you’d find it in model[0], the head is model[1].

If you want a new CNN with less classes, you can create your new model with learn1 = create_cnn(data1,...) then load the weights from the previous head into the new one with

learn1.model[0].load_state_dict(learn.model[0].state_dict())

pete1 · December 31, 2018, 6:52pm

Thanks for this response - it is exactly what I was looking for.

I have a question on a detail, though. If I print learn1.model[0], I get back the very first (attached to input) layer. model[1] gives the next (sequential block) layer. Layers 8, 9, 10, 11, 12, 13 give the adaptiveConcatPool plus what I would call the head: flatten, bn, dropout, linear, FC, relu.

So I think I need to do:
learn1.model[:9].load_state_dict(learn.model[:9].state_dict())

Would you agree with this?

sgugger · December 31, 2018, 7:52pm

Oh, I didn’t remember they were joined like this, but in this case, yes.

pete1 · December 31, 2018, 10:43pm

OK thanks for your help. I believe that this is working properly now.

tcapelle · January 10, 2019, 5:57pm

Are you working on HPA comp?

pete1 · January 10, 2019, 10:50pm

I was but I just made my last submission. I thought it was a good competition - learned a lot - though the statistics were a little flawed. Be very interesting to see the private LB!

How about you?

tcapelle · January 11, 2019, 10:40am

Yea me to, finally I should have chosen better final submissions…
Do you want to team up in another comp? Maybe whale challenge or the seismic challenge?

pete1 · January 11, 2019, 6:51pm

I haven’t looked at these - I’m going on vacation for a while but afterwards I’ll take a look.

JonathanSum · February 21, 2020, 11:19am

So I don’t need to use create head if doing a pre-train, and this way is better?

muellerzr · February 21, 2020, 11:31am

If you look at the source, create_cnn calls create_head.