More flexible transfer learning: Hacking pretrained models

I’d like to get better at hacking pretrained models.

Transfer learning is powerful, but oftentimes one would like to make adjustments to the model architecture, either simply to match the inputs or outputs, or to experiment with bigger changes to the architecture. Making changes to the architecture requires making suitable changes to the pretrained weights, otherwise transfer learning breaks down. This process can be difficult.

What resources have you found useful to thinking about and making changes in parallel to models and corresponding pretrained weights?

1 Like

Hi Nick. I have done a fair amount of this type of hacking of the model architecture for a Kaggle competition.

The best resource has been the fastai library itself. It contains several working examples of how to start with a pretrained model, extract the backbone, and append a new head that does a different task: different number of classes, regression, segmentation, for instance. All the methods that return a pretrained Learner adapted to the given DataBunch are great resources.

Changing the pretrained weights directly… is this even possible? The standard procedure is to load them into the exact original architecture, modify the architecture to suit your task, and at the right point, fine tune the backbone weights.

Most CNNs will automatically adapt to a different input size due to the nature of spatial convolutions. There is no need (or possibility AFAICT) to alter the input side of the pretrained model. Though you may need to adapt the data itself through normalization or scaling. I noticed in one case that interpolating input images up to the original pretrained size helped the final accuracy.

PyTorch is very tolerant of model (architecture) hacking. See my various posts where I finally convinced myself that PyTorch would correctly train the hacked model. It does, so you can freely experiment.

In short, my own resources have been reusing well-designed working code, a decent debugger, and lots and lots of experiments.

1 Like

i found the
to be a great resource to understand how to make the pretrained models work with fastai. you can adapt many models even if they aren’t in that cadene module yet (ie if they have the same base arch, the cuts will likely be the same).

#These models are dowloaded via the repo
#See licence here:
from torch import nn
from ..learner import model_meta
from ...core import *

pretrainedmodels = try_import('pretrainedmodels')
if not pretrainedmodels:
    raise Exception('Error: `pretrainedmodels` is needed. `pip install pretrainedmodels`')

__all__ = ['inceptionv4', 'inceptionresnetv2', 'nasnetamobile', 'dpn92', 'xception_cadene', 'se_resnet50',
           'se_resnet101', 'se_resnext50_32x4d', 'senet154', 'pnasnet5large']

def get_model(model_name:str, pretrained:bool, seq:bool=False, pname:str='imagenet', **kwargs):
    pretrained = pname if pretrained else None
    model = getattr(pretrainedmodels, model_name)(pretrained=pretrained, **kwargs)
    return nn.Sequential(*model.children()) if seq else model

def inceptionv4(pretrained:bool=False):
    model = get_model('inceptionv4', pretrained)
    all_layers = list(model.children())
    return nn.Sequential(*all_layers[0], *all_layers[1:])
model_meta[inceptionv4] = {'cut': -2, 'split': lambda m: (m[0][11], m[1])}

def nasnetamobile(pretrained:bool=False):
    model = get_model('nasnetamobile', pretrained, num_classes=1000)
    model.logits = noop
    return nn.Sequential(model)
model_meta[nasnetamobile] = {'cut': noop, 'split': lambda m: (list(m[0][0].children())[8], m[1])}

def pnasnet5large(pretrained:bool=False):
    model = get_model('pnasnet5large', pretrained, num_classes=1000)
    model.logits = noop
    return nn.Sequential(model)
model_meta[pnasnet5large] = {'cut': noop, 'split': lambda m: (list(m[0][0].children())[8], m[1])}

def inceptionresnetv2(pretrained:bool=False):  return get_model('inceptionresnetv2', pretrained, seq=True)
def dpn92(pretrained:bool=False):              return get_model('dpn92', pretrained, pname='imagenet+5k', seq=True)
def xception_cadene(pretrained=False):         return get_model('xception', pretrained, seq=True)
def se_resnet50(pretrained:bool=False):        return get_model('se_resnet50', pretrained)
def se_resnet101(pretrained:bool=False):       return get_model('se_resnet101', pretrained)
def se_resnext50_32x4d(pretrained:bool=False): return get_model('se_resnext50_32x4d', pretrained)
def senet154(pretrained:bool=False):           return get_model('senet154', pretrained)

model_meta[inceptionresnetv2] = {'cut': -2, 'split': lambda m: (m[0][9],     m[1])}
model_meta[dpn92]             = {'cut': -1, 'split': lambda m: (m[0][0][16], m[1])}
model_meta[xception_cadene]   = {'cut': -1, 'split': lambda m: (m[0][11],    m[1])}
model_meta[senet154]          = {'cut': -3, 'split': lambda m: (m[0][3],     m[1])}
_se_resnet_meta               = {'cut': -2, 'split': lambda m: (m[0][3],     m[1])}
model_meta[se_resnet50]        = _se_resnet_meta
model_meta[se_resnet101]       = _se_resnet_meta
model_meta[se_resnext50_32x4d] = _se_resnet_meta

# TODO: add "resnext101_32x4d" "resnext101_64x4d" after serialization issue is fixed:


Where can we see these? Link please

I have been looking at a rather large dataset where the number of classes goes up to 200,000. My plan was to start from 1000, and work my way up.

To make sure I am doing the transfer right, I went and redid Pets lesson 1 to be able to take weights from training just the five classes of terriers to all 37 classes.

I think this code works:

# Load old weights
# Change classes to the new number of classes;however, I know there are 37
# Replace Linear layer output with 37
learn.model[-1][-1]=nn.Linear(in_features=512,out_features=newNumberOfClasses, bias=True)
# Save the new weights'NewModel')

You can see the whole notebook here:
Pets Transfer

1 Like

Thanks, Malcom, this is helpful.

Yes, for example, I’d like to copy the pretrained weights associated with one network module to another network module. That way if we wanted to try adding say another residual block, or increase the number of filters of a convolutional layer, then we could use the pretrained weights copied from a similar block or layer as a better starting point for fine tuning than random initialization.

This operation would be deeper surgery than what you’ve successfully done already. I figure this ought to be possible, but I’ve not looked into doing it very deeply yet.

1 Like