Fastai v1 - Using a different pretrained model

Pomo · November 2, 2018, 10:47pm

Hello Coders,

I am learning how to apply fastaiv1 with a different pretrained model and dataset than the one shown in Lesson 1: senet50 from VGGFace2 and face emotion images from an old Kaggle competition.

Here’s a code fragment that appears to work, but I’m both not sure it is completely right and confused by one point…

N_IDENTITY = 8631  # the number of identities in VGGFace2 for which ResNet and SENet are trained
semodel = SENet.senet50(num_classes=N_IDENTITY, include_top=True)
utils.load_state_dict(semodel,weightspath)

def num_features_model(m:nn.Module)->int:
    "Return the number of output features for a `model`."
    for l in reversed(flatten_model(m)):
        if hasattr(l, 'num_features'): return l.num_features

body = create_body(semodel, -1)
nf = num_features_model(body) * 2
head = create_head(nf, data.c, None, ps=.5)
model = nn.Sequential(body, head)
learn = ClassificationLearner(data, model, metrics=error_rate)
learn.split((model[1],))
apply_init(model[1], nn.init.kaiming_normal_)
learn.freeze()

The last part is just the body of create_cnn written out, because I could not easily figure out how to adapt the various tables in learner.py to my needs. And did not want to get stuck on that point.

The Learner trains and recognizes well. However, I am confused where the last layer is removed and a new head is attached to the pretrained model.

The original semodel is of class nn.Module. It contains init and forward functions. These reference the last layer. So when that layer is removed, why do these functions still work? And if they are no longer used, what replaces them in the derived model?

I am just starting to delve into the fastai and PyTorch code, so the answer may be “obvious”.

Thanks!

marcmuc · November 3, 2018, 4:24pm

the second argument of create_body is where the model will be cut (layer at index -1, so the last layer in your example). The create head then sticks the new head in place of that (consisting of multiple layers itself)
Check out the docs for this part:

https://docs.fast.ai/vision.learner.html#Customize-your-model

jump to the source from there, you can see that the fastai library does this:

return (nn.Sequential(*list(model.children())[:cut]) if cut

so it essetially creates a new sequential model from the layers of the original model up to the cut layer. I think sequential creates its own forward func etc., therefore the original does not get used anymore.

marcmuc · November 4, 2018, 11:30am

https://pytorch.org/docs/stable/_modules/torch/nn/modules/container.html#Sequential

Here you can see that Sequential defines its own forward method based on all the Modules it is given.

Pomo · November 5, 2018, 3:02am

Yes, thanks. I see that the derived CNN model removes the forward method of the original pretrained model. Instead it simply passes the output of each layer as input to the next.

That should work fine for the senet50 I am playing with, even though the original senet forward() has a bit of complexity.

Pomo · November 15, 2018, 12:06am

Hi alumni,

I am looking for some clarification on the right way to use a pretrained model/custom head via fast.ai.

The context is building a classifier that maps a face to one of the seven “basic emotions”. The pretrained model was designed and trained as a face recognizer that categorizes an image into 8631 faces. It ends with a fully connected linear layer. I want to use the pretrained backbone that already extracts 2048 “features”, followed by a last layer that learns the seven categories.

In case you want specifics, the pretrained model is senet50_scratch from…

https://github.com/cydonia999/VGGFace2-pytorch#pretrained-models
Emotion labelled face images were sourced from an ancient Kaggle competition.

At first, I tried using create_body to cut the last layer of the original senet50, and append the head. The issue that came up is that senet.forward() is not a simple input->output->input between layers. Rather it has some structure, as do the convolutional subunits before it.

...
	    x = self.avgpool(x)
        
        if not self.include_top:
            return x  
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

create_body apparently flattens the layers and PyTorch puts a default forward() between them that simply passes output to input. The result is that the new forward() with a new head fails on a dimensions mismatch at the final fc layer.

My first question then is - is there a best way to deal with this situation? Or should we be aware that create_body, learn.split, and create_cnn are going to replace any existing forward() method?

I did manage to get the model working as follows…

semodel = SENet.senet50(num_classes=N_IDENTITY, include_top=True) #instantiate the senet50
utils.load_state_dict(semodel,weightspath) #Weights will load only into unaltered structure
semodel.fc = nn.Linear(2048, 7) #Replace the last layer
apply_init(semodel.fc, nn.init.kaiming_normal_)
for l in semodel.modules():		#Freeze all but the last layer
    requires_grad(l, False)
requires_grad(model.fc,True)

Next question: is this right? I am suspicious of replacing layer fc inside an already created instance. Will PyTorch understand and learn the weights of the replacement layer? And what happens to the original fc layer - is it still on the GPU? Most important, what’s the best way to handle this situation? Sorry for the novice questions.

And last, an incidental question on using PDB. How do you use it simply to trace a cell of a Jupiter notebook? It seems to step into some messy point inside the iPython interpreter. My awkward workaround is to place the statements inside a function and trace that instead. For example, instead of adding set_trace() directly to a cell…

pdb.set_trace()
body = create_body(semodel, -1)

I have to write
def foo():
…pdb.set_trace()
…body = create_body(semodel, -1)
foo()

Is there a clever workaround for this defect?

Thanks for helping!

BTW, here’s another plug for diving into a task beyond one’s current understanding. Wrestling with fastai source code, GitHub, Linux, and configuration problems is incredibly frustrating and incredibly instructive.