How to avoid overfitting in resnet

msp · June 15, 2020, 7:36pm

I noticed that in many cases, fine-tuning a resnet works great, if the training doesn’t take too many epochs.

However, if I have a case where much training is required, I tend to overfit the resnet (training loss going down, validation loss going up). And I see that the architecture by default only has Dropout at the end of the model.

Should I insert Dropout layers throughout the model? Or what is a good strategy to avoid overfitting a resnet?

vferrer · June 15, 2020, 8:01pm

You could do:

Try to get more data
More data augmentation. For example, MixUp or CutMix usually works after many epochs. There are others like Fast AutoAugment, etc
Add more regularization.
-In fastai you could easily increase dropout, weight decay, etc in the head.
- Add dropblock blocks in the body (avoid to use dropout in cnn body, use dropblock).
Reduce the network size (this is the last option!).

msp · June 15, 2020, 8:19pm

Thanks for the hints! In this particular case I can’t get more data, and augmentation possibilities are limited.
But indeed I should add regularization.
Interesting, I didn’t know about dropblocks.

vferrer · June 16, 2020, 2:25pm

rwightman has an amazing repository of image classification models (and other repos like EfficientDet, EfficientNets, etc). It has implemented a LOT of features and data augmentations and plays nicely with fastai. Models have .to_list() methods.

muellerzr · June 16, 2020, 2:59pm

Along with that I have an example integration with fastai2 here with his exact repository/library Practical-Deep-Learning-for-Coders-2.0/Computer Vision/05_EfficientNet_and_Custom_Weights.ipynb at master · muellerzr/Practical-Deep-Learning-for-Coders-2.0 · GitHub

vferrer · June 16, 2020, 5:31pm

Nice notebook! However, I think that it’s easier if you pass as_sequential=True in model creation. At least, gen-efficientnet-pytorch repo supports it. Then, you could use the standard fastai cnn_learner and pass cut=None. If I remember well, it works well with mobilenetV2 and EfficientNets.

>>> import geffnet
>>> m = geffnet.mixnet_l(pretrained=True, drop_rate=0.25, drop_connect_rate=0.2, as_sequential=True)

learn = cnn_learner(dls, m, cut=None, ...)

muellerzr · June 16, 2020, 5:33pm

None or default_cut? I’d worry about it not cutting the model then to support the pre-training

vferrer · June 16, 2020, 5:40pm

I don’t know default_cut .
I used cut=None as it’s the default split in fastaiv2. It cuts the model just before the last pooling layer

_default_meta    = {'cut':None, 'split':default_split}

You could check the model

muellerzr · June 16, 2020, 5:41pm

Sorry, default_split, you are 100% right, thank you! (and I’ll update that notebook with your wonderful improvement too )