Transfer learning from fastai learner to another leaner

romain · April 16, 2019, 1:30pm

Hi,
I am working on a classification problem for which a taxonomy is available. To make it simple, let’s consider that there are 3 levels in this taxonomy:

level_1 with 3 classes
level_2 with 10 classes (child of level_1)
level_3 with 100 classes (child of level_2)

The ultimate goal is to predict the class associated to level_3. I wanted to do the following experiment.

Start by using a pretrained model (imagenet) to train on level_1 classification
Transfer model weights from previous step and train on level_2 classification
Transfer model weights from previous step and train on level_3 classification

I did not really find an elegant way to do it using the fastai lib, so I was wondering whether I missed something. Here is what I do so far:

# level 1 training
learn = cnn_learner(level_1_data, models.resnet50, metrics=accuracy)
learn.fit_one_cycle(5)
learn.save('level_1')

# level 2 training from level 1 model
learn = cnn_learner(level_2_data, models.resnet50, metrics=accuracy)
checkpoint_dict = torch.load('level_1')        
pretrained_dict = checkpoint_dict['model']
model_dict = learn.model.state_dict()
# 1. filter out the linear layer weights
pretrained_dict = {k: v for k, v in pretrained_dict.items() if '1.8' not in k}
# 2. overwrite entries in the existing state dict 
model_dict.update(pretrained_dict)
# 3. load the new state dict
learn.model.load_state_dict(model_dict)

Any ideas would be welcome.
Thanks

pattyhendrix · April 16, 2019, 4:37pm

i think this is doing the same thing you’re talking about but a little differently. you can index into the last linear layer, set it to what you want, then train on the new output classes

kushaj · April 16, 2019, 9:41pm

Save the weights of the conv base and load it again for other models. For the classification head, create a new one.

# Use this to access the conv layer weights
conv_weights = learn.model[0].state_dict()

# After creating new learn object load these weights to conv_base

romain · April 17, 2019, 6:28am

Your approach would definitely be cleaner, but in this case, the learner would still be tight to the wrong databunch.

pattyhendrix · April 17, 2019, 4:18pm

yeah so after you train the first learner, you can change the last linear layer to have the outputs you need for the new databunch and save that model. then make a new learner, with the new databunch, same architecture, and then load the saved weights from the first model you saved

avatar · September 16, 2019, 7:39am

Hi,
I have changed the last layer to reflect the number of classes and saved the weights as described in the previous posts
learn.model[-1][-1]=nn.Linear(in_features=512,out_features=5, bias=True)
learn.save(‘stage-1_classes_5’)
conv_weights = learn.model[0].state_dict()

How to I pass the saved conv_weights to the new learner and use the stage-1_class_5 as the architecture?

Thanks!

mottyl · September 24, 2019, 5:46pm

Does anyone can shed some light on the same problem (i.e: Transfer learning twice with different number of classes) when using a Sequential RNN - trying to perform TL with Text?
The “learn.model[-1][-1]” does not work for this class…
Any help will be appreciated!

gbob · April 8, 2021, 8:30am

Hey @romain,

Did you find a good solution to this? I am trying to do pretty much the same thing.

I had a look at the create custom head as suggested by other replies but I don’t think that is a solution.

I see that for NLP there is a built in fn that allows exactly what you are asking its called save_encoders() but I cannot find anything similar for cnn’s.

any advice would be greatly appreciated.