Transfer learning... twice?

Hi Everyone,

I want to understand a bit more about how transfer learning works in practice and was hoping that someone here might be able to point me in the right direction.

I have my own set of data (images of a forest taken by a drone) with annotations. I have successfully adapted the notebook to lesson 3 so I can transfer learn using the weights from the imagenet competition. So, imagenet + fine tunning is done, the results are nice but not super great (main problems are very likely the low number of images and poor annotations).

However, I would like to go one step furhter, I would like to fine-tune the imagenet model to a problem (with lots of data) that is closer to mine and then fine-tune the result with my images. So I want to do tranfer-learning “twice”.

So, I have trained the same imagenet model using the “Planet” dataset as is explained in the lesson. From now on I will call the model that I trained (again, transfer-learning from imagenet) the “planet” model.

Now what is left is to fine-tune the “planet” model with my images. However I have not yet been able to figure out how to do it.

CONCERNS:

  • I have to be careful not to discard the fine-tuned part of the planet model (if I cut enough I will end up with the “imagenet” model.

  • I cannot directly load the planet model onto my problem, the planet model outputs seven categories and my data only has five, so at the very least I need to change that.

I have been looking at several things (create_body and create_head) and Transfer learning with different model shapes (error: size mismatch) so far are the most promising, but so far I have not had much success. Any pointers will be greatly appreciated.

3 Likes

if you unfreeze the model and train it on the planet dataset, that will let all the layers train and not just the last ones. so then you could load that trained model as a starting point for your dataset. i’ve done this with some datasets and it seemed to be pretty helpful

3 Likes

Thanks for the answer. I will definetely try that. Did you ever use the re-trained planet dataset with a data set with a different number of classes?

If so, how did you change the head of the model so it could be loaded by the learner of the new dataset?

i did it messing around with some super resolution stuff, but i dont think it matters if you have the same number of classes or not. i think by default fastai takes off the last layer or two which is where the number of outputs are.

imagenet has 1,000 classes i think so whenever youre using a pretrained model, youre probably training it on a different number of classes. which is the same thing you want to do.

so i think you should be able to train a model on the planet dataset (lets say with resnet34) save the weights, setup your dataset and a learner with resnet34, then load the weights from the training of the planet dataset.

That is what I thought too, and given your answers, I am just probably not doing it right. So far, this is my code:

  1. Train with the planet dataset (same as the lesson 3 notebook)

     from fastai.vision import *
     from fastai import *
    
     path = Config.data_path()/'MYPATH'
     path.mkdir(parents=True, exist_ok=True)
    
     df = pd.read_csv(path/'train_v2.csv')
     tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
     np.random.seed(42)
     src = (ImageList.from_csv(path, 'train_v2.csv', folder='train-jpg', suffix='.jpg')
            .split_by_rand_pct(0.2)
            .label_from_df(label_delim=' '))
     data = (src.transform(tfms, size=128)
             .databunch().normalize(imagenet_stats))
    
     arch = models.resnet50
     acc_02 = partial(accuracy_thresh, thresh=0.2)
     f_score = partial(fbeta, thresh=0.2)
     learn = cnn_learner(data, arch, metrics=[acc_02, f_score])
     learn.lr_find()
    
     lr = 0.01
     learn.fit_one_cycle(5, slice(lr))
    
     learn.save('Planetstage-1-rn50')
    

So basically I read the planet data just as is done in the course and do some fitting (of the last layers, no unfreezing). I will of course also try your suggestion of unfreezing and retraining the full model, I have not had time yet. At the end of the code, I save my model in a file.

Then, I try to setup things for my images (not that when I just modify notebook3 and try to load resnet50 everything works fine, the problem is trying to load the stored model:

    path = Config.data_path()/self.path
    path.mkdir(parents=True, exist_ok=True)

    df = pd.read_csv(self.path+self.labelFileName)
    df.head()

    tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
    np.random.seed(42)
    src = (ImageList.from_csv(path, self.labelFileName, folder=self.imageDir,      suffix=self.suffix).split_by_rand_pct(0.2).label_from_df(label_delim=' '))
    data = (src.transform(tfms, size=128).databunch().normalize(imagenet_stats))
    arch = models.resnet50
    acc_02 = partial(accuracy_thresh, thresh=0.2)
    f_score = partial(fbeta, thresh=0.2)
    learn = cnn_learner(data, arch, metrics=[acc_02, f_score])
    learn.load(self.modelFile)

When I try this, I get the following error:

RuntimeError: Error(s) in loading state_dict for Sequential:
	size mismatch for 1.8.weight: copying a param with shape torch.Size([17, 512]) from checkpoint, the shape in current model is torch.Size([5, 512]).
	size mismatch for 1.8.bias: copying a param with shape torch.Size([17]) from checkpoint, the shape in current model is torch.Size([5]).

If I understand correctly, the problem is that the model before used to predict 17 categories (planet) and it now predicts 5 (my data).

youre right. i think this might fix it.
once youre done training on the planet dataset:

learn.model[-1][-1]=nn.Linear(in_features=512,out_features=5, bias=True)

i think this should set the last linear layer (outputs) to the number of outputs for your data. i just plugged in the same number of in_features that learn.model[-1][-1] printed out and switched out_features with the number of classes your dataset has.

then you can do learn.save(‘double-pretrain’), and you should be able to make a new learner with your dataset and resnet50, learn.load(‘double-pretrain’) and train :crossed_fingers:

4 Likes

Yes, this is exactly what I was looking for. It is now fitting as I write this. I will run a few numbers and try to post a minimmum working example in case anyone else has the same problem in the future.

Thanks a lot for your help pattyhendrix!

So, in the end, this is how my code looks like:

First, transfer learning (follows notebook from lesson3),

  1. Load Resnet50 with the weights from imagenet:

    from fastai.vision import *
    from fastai import *

    path = Config.data_path()/‘MYPATH’
    path.mkdir(parents=True, exist_ok=True)

    df = pd.read_csv(path/‘train_v2.csv’)
    tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
    np.random.seed(42)
    src = (ImageList.from_csv(path, ‘train_v2.csv’, folder=‘train-jpg’, suffix=’.jpg’)
    .split_by_rand_pct(0.2)
    .label_from_df(label_delim=’ '))
    data = (src.transform(tfms, size=128)
    .databunch().normalize(imagenet_stats))

    arch = models.resnet50
    acc_02 = partial(accuracy_thresh, thresh=0.2)
    f_score = partial(fbeta, thresh=0.2)
    learn = cnn_learner(data, arch, metrics=[acc_02, f_score])

  2. Then, fit the model to the planet database.

    learn.lr_find()

    lr = 0.01
    learn.fit_one_cycle(5, slice(lr))

    learn.model[-1][-1]=nn.Linear(in_features=512,out_features=5, bias=True)
    learn.save(‘Planetstage-1-rn50’)

Notice that before saving we change the number of categories that the model outputs so we can then open it with the other data (we change from 7 to 5).

I also re-run the whole thing changing the last bit with:

learn.unfreeze()
learn.lr_find()

learn.fit_one_cycle(5, slice(1e-5, lr/5))

learn.model[-1][-1]=nn.Linear(in_features=512,out_features=5, bias=True)
learn.save('Planetstage-2-rn50')

Second, Re-train the model fitted for “Planet” with my images

Then I loaded my images with the tweaked models (do not worry about the “self” parts, this is inside of a python class, they mainly carry the information on where to find the images):

    path = Config.data_path()/self.path
    path.mkdir(parents=True, exist_ok=True)

    df = pd.read_csv(self.path+self.labelFileName)
    df.head()

    tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
    np.random.seed(42)
    src = (ImageList.from_csv(path, self.labelFileName, folder=self.imageDir,      suffix=self.suffix).split_by_rand_pct(0.2).label_from_df(label_delim=' '))
    data = (src.transform(tfms, size=128).databunch().normalize(imagenet_stats))
    arch = models.resnet50
    acc_02 = partial(accuracy_thresh, thresh=0.2)
    f_score = partial(fbeta, thresh=0.2)
    learn = cnn_learner(data, arch, metrics=[acc_02, f_score])
    learn.load(self.modelFile)

And then I was able to fit the model again.

Hope this helps.

sweet! you’re welcome :slightly_smiling_face: did it help with your dataset?

Yes it did.

I have run a small experiment training resnet50 fitted in the following way with my network:

Imagenet alone, from now in imnet
Imagenet frozen, planet for last layers, planet
Imagenet unfrozen, finetune with layers, unfplanet

I then set up an experiment to take these three models and train them (again, frozen or unfrozen) with different values of lr.

If I run the experiment with 5 epochs, I get basically the same results with frozen imnet and frozen unfplanet. planet alone does worse. So, you were right that unfreezing imagenet before fine-tunning for planet helps.

Even more interesting, if I let it run for 10 epochs, everybody improves, but unfrozen unfplanet gets the best results overall. Thinking about it, this is the version (two transfer learning steps, both unfrozen) that allows for more fitting, so it makes sense that it benefits from running longer.

My concern now is that this last case is likely to be overfitting. I am supposed to get more data in a few days time that should help me with that (do the models work well with the new data?)

Anyway, thanks a lot for your help and I hope this thread helps others with similar problems.

1 Like

Quick Question. I currently have a production model with 8 classes. It trains to 98% accuracy. I’ve cleaned the dataset on numerous occassions. I have about 30,000 test images per class. Everytime I use it in production, it is fairly inaccurate (65% or so). Should i try to transfer learn twice or institute a multi-model production system?

Thanks

is there a difference between the images the model was trained on and the images its getting in production?

Yes, the production images are of varying size and resolution. Some of these images are thumbnail sized and some are much larger and higher resolution. They are resized to 299, however, just like the training environment.

Maybe train a small image and large image model?

I Will caveat that the problem I’m attempting to tackle is complex, so I guess I’m more or less wondering, do I continue to re-train the model with the data it mis-labeled vs transfer learn my previous model vs use a multi-stage approach (i.e instead of deciding between 8 classes with one inference, use 3 separate models in an if-then approach)

So this approach works, but seems unclean since you have to reload the original setup, modify it, save it with the new outputs before loading in the new way.
Is there a proper way to convert a model thats already been trained, into a format that the “base_arch” parameter in the cnn_learner(… function accepts? much like how “models.resnet34” is used on new models currently?
That way it can be used generically with any number of classes, out of the gate?

1 Like

apologies for raising this thread from the dead, but I could not find a more pertinent one and did not want to create a new thread for something that is probably very simple (sorry I checked the docs but could not find the answer.

To complete my study of how transfer learning works with my data, I would like to try to train resnet with random weights with my data. I am aware that it will very likely not work well, but I would like to know exactly how bad it gets.

As doing this:

path = Config.data_path()/self.path
    path.mkdir(parents=True, exist_ok=True)

    df = pd.read_csv(self.path+self.labelFileName)
    df.head()

    tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
    np.random.seed(42)
    src = (ImageList.from_csv(path, self.labelFileName, folder=self.imageDir,      suffix=self.suffix).split_by_rand_pct(0.2).label_from_df(label_delim=' '))
    data = (src.transform(tfms, size=128).databunch().normalize(imagenet_stats))
    arch = models.resnet50
    acc_02 = partial(accuracy_thresh, thresh=0.2)
    f_score = partial(fbeta, thresh=0.2)
    learn = cnn_learner(data, arch, metrics=[acc_02, f_score])

Already loads resnet50 with the imagenet weights, I was wondering if there is a function to randomize the weights (or to load the model without the weights). Alternatively, if someone can point me towards methods for getting the matrix from / passing it to a model, I can also write my own weight randomization method.

Thanks for your help.

Check out the docs for cnn_learner. It already takes a ‘pretrained’ parameter.

Thanks Pomo for the quick and precise answer. I was indeed right not to create a new topic about this as this was a bit of a stupid question.

I had actually checked the doc you pointed me too but had not understood what the parameter meant. After reading it again it is actually pretty clear though (I am not sure what I was thinking):

The doc says:

cnn_learner [source][test]

cnn_learner ( data : DataBunch , base_arch : Callable , cut : Union [ int , Callable ]= None , pretrained : bool = True , lin_ftrs : Optional [ Collection [ int ]]= None , ps : Floats = 0.5 , custom_head : Optional [ Module ]= None , split_on : Union [ Callable , Collection [ ModuleList ], NoneType ]= None , bn_final : bool = False , init = 'kaiming_normal_' , concat_pool : bool = True , **** kwargs** : Any ) → Learner

Build convnet style learner.

This method creates a Learner object from the data object and model inferred from it with the backbone given in arch . Specifically, it will cut the model defined by arch (randomly initialized if pretrained is False) at the last convolutional layer by default (or as defined in cut , see below) and add:

So,

path = Config.data_path()/self.path
    path.mkdir(parents=True, exist_ok=True)

    df = pd.read_csv(self.path+self.labelFileName)
    df.head()

    tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
    np.random.seed(42)
    src = (ImageList.from_csv(path, self.labelFileName, folder=self.imageDir,      suffix=self.suffix).split_by_rand_pct(0.2).label_from_df(label_delim=' '))
    data = (src.transform(tfms, size=128).databunch().normalize(imagenet_stats))
    arch = models.resnet50
    acc_02 = partial(accuracy_thresh, thresh=0.2)
    f_score = partial(fbeta, thresh=0.2)
    learn = cnn_learner(data, arch, metrics=[acc_02, f_score],**pretrained=False**)

Should do the trick. I leave it here in case someone else has a similar case to mine of not seeing what is clearly written in the docs they are reading. I still have not run the code but I plan on doing it soon and I will let you know how bad it gets with some numbers.

Just a quick update on this for completeness sake.

The above code indeed does what it was intended to do. I have compared the three following models with my data:

-M1: Unfrozen random weights model (I do not think keeping random weights frozen makes any sense)
-M2: Unfrozen imagenet model (not really a good idea with few images, but fair for comparison purposes)
-M3: Frozen imagenet model

As expected, M3, frozen imagenet performs best and improves 11.93% over the random model M1, Unfrozen imagenet M2 also improves the random model but by less, 4.7%.

Of course this is a rather specific example and it is not possible to draw general conclusions from it, but it does go en the direction that the my understanding of how transfer learning works dictates and, in this sense, it is nice to get one’s intuitions quantified and “reinforced”.

Wow, thanks! This works perfectly!

You just forgot to add .cuda() at the end of the line to make the layer (re-)trainable. So the correct line should be:

learner.model[-1][-1] = nn.Linear(in_features=512, out_features=num_classes_of_the_model_i_want_to_load, bias=True).cuda()

And there is no need to create another learner or whatever, one can just create the learner as usual (with the number of classes specified in the DataBunch), then execute the line above, then load the model with the “wrong” number of classes, then execute the line below:

learner.model[-1][-1] = nn.Linear(in_features=512, out_features=len(learn.data.classes), bias=True).cuda()

1 Like