How to avoid overfitting in resnet

I noticed that in many cases, fine-tuning a resnet works great, if the training doesn’t take too many epochs.

However, if I have a case where much training is required, I tend to overfit the resnet (training loss going down, validation loss going up). And I see that the architecture by default only has Dropout at the end of the model.

Should I insert Dropout layers throughout the model? Or what is a good strategy to avoid overfitting a resnet?

You could do:

  • Try to get more data
  • More data augmentation. For example, MixUp or CutMix usually works after many epochs. There are others like Fast AutoAugment, etc
  • Add more regularization.
    -In fastai you could easily increase dropout, weight decay, etc in the head.
    • Add dropblock blocks in the body (avoid to use dropout in cnn body, use dropblock).
  • Reduce the network size (this is the last option!).
1 Like

Thanks for the hints! In this particular case I can’t get more data, and augmentation possibilities are limited.
But indeed I should add regularization.
Interesting, I didn’t know about dropblocks.

rwightman has an amazing repository of image classification models (and other repos like EfficientDet, EfficientNets, etc). It has implemented a LOT of features and data augmentations and plays nicely with fastai. Models have .to_list() methods.

1 Like

Along with that I have an example integration with fastai2 here with his exact repository/library :slight_smile: https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Computer%20Vision/05_EfficientNet_and_Custom_Weights.ipynb

1 Like

Nice notebook! However, I think that it’s easier if you pass as_sequential=True in model creation. At least, gen-efficientnet-pytorch repo supports it. Then, you could use the standard fastai cnn_learner and pass cut=None. If I remember well, it works well with mobilenetV2 and EfficientNets.

>>> import geffnet
>>> m = geffnet.mixnet_l(pretrained=True, drop_rate=0.25, drop_connect_rate=0.2, as_sequential=True)

learn = cnn_learner(dls, m, cut=None, ...)
1 Like

None or default_cut? I’d worry about it not cutting the model then to support the pre-training

I don’t know default_cut :confused: .
I used cut=None as it’s the default split in fastaiv2. It cuts the model just before the last pooling layer

_default_meta    = {'cut':None, 'split':default_split}

You could check the model

1 Like

Sorry, default_split, you are 100% right, thank you! (and I’ll update that notebook with your wonderful improvement too :slight_smile: )