Transfer learning... twice?

#1

Hi Everyone,

I want to understand a bit more about how transfer learning works in practice and was hoping that someone here might be able to point me in the right direction.

I have my own set of data (images of a forest taken by a drone) with annotations. I have successfully adapted the notebook to lesson 3 so I can transfer learn using the weights from the imagenet competition. So, imagenet + fine tunning is done, the results are nice but not super great (main problems are very likely the low number of images and poor annotations).

However, I would like to go one step furhter, I would like to fine-tune the imagenet model to a problem (with lots of data) that is closer to mine and then fine-tune the result with my images. So I want to do tranfer-learning “twice”.

So, I have trained the same imagenet model using the “Planet” dataset as is explained in the lesson. From now on I will call the model that I trained (again, transfer-learning from imagenet) the “planet” model.

Now what is left is to fine-tune the “planet” model with my images. However I have not yet been able to figure out how to do it.

CONCERNS:

  • I have to be careful not to discard the fine-tuned part of the planet model (if I cut enough I will end up with the “imagenet” model.

  • I cannot directly load the planet model onto my problem, the planet model outputs seven categories and my data only has five, so at the very least I need to change that.

I have been looking at several things (create_body and create_head) and Transfer learning with different model shapes (error: size mismatch) so far are the most promising, but so far I have not had much success. Any pointers will be greatly appreciated.

1 Like

(Jason Patnick) #2

if you unfreeze the model and train it on the planet dataset, that will let all the layers train and not just the last ones. so then you could load that trained model as a starting point for your dataset. i’ve done this with some datasets and it seemed to be pretty helpful

2 Likes

#3

Thanks for the answer. I will definetely try that. Did you ever use the re-trained planet dataset with a data set with a different number of classes?

If so, how did you change the head of the model so it could be loaded by the learner of the new dataset?

0 Likes

(Jason Patnick) #4

i did it messing around with some super resolution stuff, but i dont think it matters if you have the same number of classes or not. i think by default fastai takes off the last layer or two which is where the number of outputs are.

imagenet has 1,000 classes i think so whenever youre using a pretrained model, youre probably training it on a different number of classes. which is the same thing you want to do.

so i think you should be able to train a model on the planet dataset (lets say with resnet34) save the weights, setup your dataset and a learner with resnet34, then load the weights from the training of the planet dataset.

0 Likes

#5

That is what I thought too, and given your answers, I am just probably not doing it right. So far, this is my code:

  1. Train with the planet dataset (same as the lesson 3 notebook)

     from fastai.vision import *
     from fastai import *
    
     path = Config.data_path()/'MYPATH'
     path.mkdir(parents=True, exist_ok=True)
    
     df = pd.read_csv(path/'train_v2.csv')
     tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
     np.random.seed(42)
     src = (ImageList.from_csv(path, 'train_v2.csv', folder='train-jpg', suffix='.jpg')
            .split_by_rand_pct(0.2)
            .label_from_df(label_delim=' '))
     data = (src.transform(tfms, size=128)
             .databunch().normalize(imagenet_stats))
    
     arch = models.resnet50
     acc_02 = partial(accuracy_thresh, thresh=0.2)
     f_score = partial(fbeta, thresh=0.2)
     learn = cnn_learner(data, arch, metrics=[acc_02, f_score])
     learn.lr_find()
    
     lr = 0.01
     learn.fit_one_cycle(5, slice(lr))
    
     learn.save('Planetstage-1-rn50')
    

So basically I read the planet data just as is done in the course and do some fitting (of the last layers, no unfreezing). I will of course also try your suggestion of unfreezing and retraining the full model, I have not had time yet. At the end of the code, I save my model in a file.

Then, I try to setup things for my images (not that when I just modify notebook3 and try to load resnet50 everything works fine, the problem is trying to load the stored model:

    path = Config.data_path()/self.path
    path.mkdir(parents=True, exist_ok=True)

    df = pd.read_csv(self.path+self.labelFileName)
    df.head()

    tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
    np.random.seed(42)
    src = (ImageList.from_csv(path, self.labelFileName, folder=self.imageDir,      suffix=self.suffix).split_by_rand_pct(0.2).label_from_df(label_delim=' '))
    data = (src.transform(tfms, size=128).databunch().normalize(imagenet_stats))
    arch = models.resnet50
    acc_02 = partial(accuracy_thresh, thresh=0.2)
    f_score = partial(fbeta, thresh=0.2)
    learn = cnn_learner(data, arch, metrics=[acc_02, f_score])
    learn.load(self.modelFile)

When I try this, I get the following error:

RuntimeError: Error(s) in loading state_dict for Sequential:
	size mismatch for 1.8.weight: copying a param with shape torch.Size([17, 512]) from checkpoint, the shape in current model is torch.Size([5, 512]).
	size mismatch for 1.8.bias: copying a param with shape torch.Size([17]) from checkpoint, the shape in current model is torch.Size([5]).

If I understand correctly, the problem is that the model before used to predict 17 categories (planet) and it now predicts 5 (my data).

0 Likes

(Jason Patnick) #6

youre right. i think this might fix it.
once youre done training on the planet dataset:

learn.model[-1][-1]=nn.Linear(in_features=512,out_features=5, bias=True)

i think this should set the last linear layer (outputs) to the number of outputs for your data. i just plugged in the same number of in_features that learn.model[-1][-1] printed out and switched out_features with the number of classes your dataset has.

then you can do learn.save(‘double-pretrain’), and you should be able to make a new learner with your dataset and resnet50, learn.load(‘double-pretrain’) and train :crossed_fingers:

1 Like

Transfer learning from fastai learner to another leaner
#7

Yes, this is exactly what I was looking for. It is now fitting as I write this. I will run a few numbers and try to post a minimmum working example in case anyone else has the same problem in the future.

Thanks a lot for your help pattyhendrix!

0 Likes

#8

So, in the end, this is how my code looks like:

First, transfer learning (follows notebook from lesson3),

  1. Load Resnet50 with the weights from imagenet:

    from fastai.vision import *
    from fastai import *

    path = Config.data_path()/‘MYPATH’
    path.mkdir(parents=True, exist_ok=True)

    df = pd.read_csv(path/‘train_v2.csv’)
    tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
    np.random.seed(42)
    src = (ImageList.from_csv(path, ‘train_v2.csv’, folder=‘train-jpg’, suffix=’.jpg’)
    .split_by_rand_pct(0.2)
    .label_from_df(label_delim=’ '))
    data = (src.transform(tfms, size=128)
    .databunch().normalize(imagenet_stats))

    arch = models.resnet50
    acc_02 = partial(accuracy_thresh, thresh=0.2)
    f_score = partial(fbeta, thresh=0.2)
    learn = cnn_learner(data, arch, metrics=[acc_02, f_score])

  2. Then, fit the model to the planet database.

    learn.lr_find()

    lr = 0.01
    learn.fit_one_cycle(5, slice(lr))

    learn.model[-1][-1]=nn.Linear(in_features=512,out_features=5, bias=True)
    learn.save(‘Planetstage-1-rn50’)

Notice that before saving we change the number of categories that the model outputs so we can then open it with the other data (we change from 7 to 5).

I also re-run the whole thing changing the last bit with:

learn.unfreeze()
learn.lr_find()

learn.fit_one_cycle(5, slice(1e-5, lr/5))

learn.model[-1][-1]=nn.Linear(in_features=512,out_features=5, bias=True)
learn.save('Planetstage-2-rn50')

Second, Re-train the model fitted for “Planet” with my images

Then I loaded my images with the tweaked models (do not worry about the “self” parts, this is inside of a python class, they mainly carry the information on where to find the images):

    path = Config.data_path()/self.path
    path.mkdir(parents=True, exist_ok=True)

    df = pd.read_csv(self.path+self.labelFileName)
    df.head()

    tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
    np.random.seed(42)
    src = (ImageList.from_csv(path, self.labelFileName, folder=self.imageDir,      suffix=self.suffix).split_by_rand_pct(0.2).label_from_df(label_delim=' '))
    data = (src.transform(tfms, size=128).databunch().normalize(imagenet_stats))
    arch = models.resnet50
    acc_02 = partial(accuracy_thresh, thresh=0.2)
    f_score = partial(fbeta, thresh=0.2)
    learn = cnn_learner(data, arch, metrics=[acc_02, f_score])
    learn.load(self.modelFile)

And then I was able to fit the model again.

Hope this helps.

0 Likes

(Jason Patnick) #9

sweet! you’re welcome :slightly_smiling_face: did it help with your dataset?

0 Likes

#10

Yes it did.

I have run a small experiment training resnet50 fitted in the following way with my network:

Imagenet alone, from now in imnet
Imagenet frozen, planet for last layers, planet
Imagenet unfrozen, finetune with layers, unfplanet

I then set up an experiment to take these three models and train them (again, frozen or unfrozen) with different values of lr.

If I run the experiment with 5 epochs, I get basically the same results with frozen imnet and frozen unfplanet. planet alone does worse. So, you were right that unfreezing imagenet before fine-tunning for planet helps.

Even more interesting, if I let it run for 10 epochs, everybody improves, but unfrozen unfplanet gets the best results overall. Thinking about it, this is the version (two transfer learning steps, both unfrozen) that allows for more fitting, so it makes sense that it benefits from running longer.

My concern now is that this last case is likely to be overfitting. I am supposed to get more data in a few days time that should help me with that (do the models work well with the new data?)

Anyway, thanks a lot for your help and I hope this thread helps others with similar problems.

1 Like

(dan) #11

Quick Question. I currently have a production model with 8 classes. It trains to 98% accuracy. I’ve cleaned the dataset on numerous occassions. I have about 30,000 test images per class. Everytime I use it in production, it is fairly inaccurate (65% or so). Should i try to transfer learn twice or institute a multi-model production system?

Thanks

0 Likes

(Jason Patnick) #12

is there a difference between the images the model was trained on and the images its getting in production?

0 Likes

More flexible transfer learning: Hacking pretrained models
(dan) #13

Yes, the production images are of varying size and resolution. Some of these images are thumbnail sized and some are much larger and higher resolution. They are resized to 299, however, just like the training environment.

Maybe train a small image and large image model?

I Will caveat that the problem I’m attempting to tackle is complex, so I guess I’m more or less wondering, do I continue to re-train the model with the data it mis-labeled vs transfer learn my previous model vs use a multi-stage approach (i.e instead of deciding between 8 classes with one inference, use 3 separate models in an if-then approach)

0 Likes

(Thavidu) #14

So this approach works, but seems unclean since you have to reload the original setup, modify it, save it with the new outputs before loading in the new way.
Is there a proper way to convert a model thats already been trained, into a format that the “base_arch” parameter in the cnn_learner(… function accepts? much like how “models.resnet34” is used on new models currently?
That way it can be used generically with any number of classes, out of the gate?

0 Likes