Transfer learning from fastai learner to another leaner

(Romain Thalineau) #1

I am working on a classification problem for which a taxonomy is available. To make it simple, let’s consider that there are 3 levels in this taxonomy:

  • level_1 with 3 classes
  • level_2 with 10 classes (child of level_1)
  • level_3 with 100 classes (child of level_2)

The ultimate goal is to predict the class associated to level_3. I wanted to do the following experiment.

  1. Start by using a pretrained model (imagenet) to train on level_1 classification
  2. Transfer model weights from previous step and train on level_2 classification
  3. Transfer model weights from previous step and train on level_3 classification

I did not really find an elegant way to do it using the fastai lib, so I was wondering whether I missed something. Here is what I do so far:

# level 1 training
learn = cnn_learner(level_1_data, models.resnet50, metrics=accuracy)

# level 2 training from level 1 model
learn = cnn_learner(level_2_data, models.resnet50, metrics=accuracy)
checkpoint_dict = torch.load('level_1')        
pretrained_dict = checkpoint_dict['model']
model_dict = learn.model.state_dict()
# 1. filter out the linear layer weights
pretrained_dict = {k: v for k, v in pretrained_dict.items() if '1.8' not in k}
# 2. overwrite entries in the existing state dict 
# 3. load the new state dict

Any ideas would be welcome.

1 Like

More flexible transfer learning: Hacking pretrained models
(Jason Patnick) #2

i think this is doing the same thing you’re talking about but a little differently. you can index into the last linear layer, set it to what you want, then train on the new output classes

1 Like

(Kushajveer Singh) #3

Save the weights of the conv base and load it again for other models. For the classification head, create a new one.

# Use this to access the conv layer weights
conv_weights = learn.model[0].state_dict()

# After creating new learn object load these weights to conv_base
1 Like

(Romain Thalineau) #4

Your approach would definitely be cleaner, but in this case, the learner would still be tight to the wrong databunch.


(Jason Patnick) #5

yeah so after you train the first learner, you can change the last linear layer to have the outputs you need for the new databunch and save that model. then make a new learner, with the new databunch, same architecture, and then load the saved weights from the first model you saved