Mobilenet v2 - How to fine tune?

Bianca · October 29, 2020, 3:52pm

I’m just starting out with fastai and am trying to do transfer learning using Mobilenet v2 on a custom dataset.

cnn_learner() doesn’t currently support this architecture out of the box so I’m trying to import a pretrained model from torchvision.models as such:

import torchvision.models as models
mobilenet = models.mobilenet_v2(pretrained=True)

What would be the next steps from here? How much do I need to modify this model using Pytorch (cutting off the classifier, adding new layers…) before I can plug it into cnn_learner() Is this possible at all?

florianl · October 30, 2020, 4:40pm

Hi Bianca,

you’ll have to define the _model_meta for the mobilenet_v2. As far as I could see, mobilenet doesn’t have a pooling layer so I am not sure if fastai’s concat pool make sense with this mobilenet_v2?

import torchvision.models as models

model = models.mobilenet_v2

# copied the split and cut from denseness ... you'll have to check if the splits and cut make sense
def _mobilenet_v2_split(m:nn.Module): return L(m[0][0][:7],m[0][0][7:], m[1:]).map(params)
_mobilenet_v2_meta   = {'cut':-1, 'split':_mobilenet_v2_split, 'stats':imagenet_stats}
model_meta[models.mobilenet_v2] = {**_mobilenet_v2_meta}

# create dls 

learn = cnn_learner(dls, model, pretrained=True)

Tried that with a pretreated xresneXt a while ago … should work that way :).

Florian

Bianca · October 30, 2020, 5:32pm

Hi Florian,

Thank you very much for your suggestion, I will try it out during the weekend and update you on how it went!

rsomani95 · October 30, 2020, 7:40pm

Try this:

learn = cnn_learner(dls, models.mobilenet_v2, cut=-1)

Bianca · October 31, 2020, 7:08am

Hi Rahul,

Thank you for your help!

I did this:

import torchvision.models as models
mobilenet = models.mobilenet_v2(pretrained=True)
learn = cnn_learner(dls, mobilenet, cut=-1, metrics=[accuracy])

And got this error:

TypeError: forward() got an unexpected keyword argument 'pretrained'

Then I tried without (pretrained=True) and it’s currently training. Judging by the accuracy after the first epoch (~91%), it would seem that the model comes pretrained despite leaving this part out.

I should probably spend some time delving deeper into what goes on under the hood in the fastai API. :))

rsomani95 · October 31, 2020, 8:21am

Hi Bianca, happy to help

cnn_learner expects to take in a function that when called gives the model, and not the model itself (you passed the model in the snippet above).

This is passing the function models.mobilenet_v2 into cnn_learner
cnn_learner accepts a pretrained argument that is True by default. So, the above code is the same as

learn = cnn_learner(dls, models.mobilenet_v2, cut=-1, pretrained=True)

That’s always a good idea

Bianca · November 1, 2020, 9:14am

rsomani95:

This is passing the function models.mobilenet_v2 into cnn_learner
cnn_learner accepts a pretrained argument that is True by default. So, the above code is the same as
learn = cnn_learner(dls, models.mobilenet_v2, cut=-1, pretrained=True)

Yes, this makes of course sense. Thank you!

florianl · November 1, 2020, 12:16pm

Keep in mind that you just split the model in two parts (basically body and head) without passing a splitter (or defining the model_meta with cut and split). The fastai-way of splitting models is to have three splits (top half of the body / bottom half of the body / head).

Bianca · November 2, 2020, 6:22am

Hi Florian,

I tried your suggested way and that worked too. I’m not fully able to grasp what difference it makes during training though? Does it affect which layers are frozen/unfrozen during different stages of fine tuning?

rsomani95 · November 2, 2020, 3:43pm

When you call learn.freeze(), all layers except the head are frozen. The main difference it makes is that different parts of the model get different hyperparameters – learning_rate, momentum, etc.

If you had 2 split and you passed in learn.fit(num_epochs, slice(1e-3)), the two parts of your model would get learning rates of 1e-4 and 1e-3. If there were three, they would get 1e-5, 1e-4 and 1e-3 respectively.

TIL! I’ve been splitting all my mobilenets into two parts thus far and it’s worked pretty well; curious to see how further splitting it helps

Bianca · November 2, 2020, 4:16pm

A follow-up question -

I can easily access the different layers in the head like in this example:

def set_dropout(learner, p1, p2):
  learner.model[1][3] = nn.Dropout(p=p1, inplace=False)
  learner.model[1][7] = nn.Dropout(p=p2, inplace=False)

learn = cnn_learner(dls, model, pretrained=True, metrics=[accuracy])

# Set Dropout = 0 in the dropout layers in the head
set_dropout(learn, 0, 0)

But I haven’t found a similarly easy way to access the hyperparameters in the body. For instance, I’d like to unfreeze and train only part of the top layers in the body. How could this be done?

rsomani95 · November 2, 2020, 4:24pm

There’s two ways you could do that:

Split the model into groups as @florianl mentioned above. You can then do learn.freeze_to(...). However, I think this freezes a bunch of layers sequentially, so if you need to only tweak a specific part, the method below is better
Access the specific layer, and turn off logging gradients by changing the requires_grad attribute. For example:

learn = cnn_learner(dls, model)
for p in learn.model[1][3].parameters():
    p.requires_grad = None
    # p.requires_grad = False
    # I remember reading somewhere `None` is better, but can't remember why

EDIT: I meant for p in model[...].parameters()

Bianca · November 26, 2020, 10:21am

Hello again!

florianl:

import torchvision.models as models

model = models.mobilenet_v2

# copied the split and cut from denseness ... you'll have to check if the splits and cut make sense
def _mobilenet_v2_split(m:nn.Module): return L(m[0][0][:7],m[0][0][7:], m[1:]).map(params)
_mobilenet_v2_meta   = {'cut':-1, 'split':_mobilenet_v2_split, 'stats':imagenet_stats}
model_meta[models.mobilenet_v2] = {**_mobilenet_v2_meta}

# create dls 

learn = cnn_learner(dls, model, pretrained=True)

So this worked fine when I tried it about a month ago but it no longer does. I tried to run the same notebook again and got this error message:

TypeError: forward() got an unexpected keyword argument 'pretrained'

Has anything major changed in fastai in the last few weeks?