I’ve been playing around with the fastai one-cycle learning rate policy for the last week or so, and I can’t seem to get the model to fit under the situation where:
(i) the pre-trained network is trained on too different a data domain, i.e. my target application is not very ImageNet like and (ii) I have very little labeled data for the target application, i.e. 1000+ data points for each class for a binary classification. The frozen network trains but the metrics are very poor. An unfrozen network immediately overfits (val and train loss diverges).
Does anyone have any advice on what settings to adjust? I already did the lr_find stuff. So far I have:
Used dropout of 0.25 for features coming from the pre-trained network
Switched the dense layer to 1x1 convolution with 128 filters (so less parameters compared to the dense layer of 512 neurons)
Played with train_bn settings between T/F, and also added/remove bn layers in my custom head
More epochs seems to make it just overfit more
Unfrozen network immediately overfits, even reducing backbone lr to lr/100 doesn’t help much
I had similar case, where I had two categories unlike imagenet. Training with pre-trained network worked for me better then training from scratch. With freezed and next unfreezed weights.
What comes more to my mind:
EDITED increase weight decay (instead of decreasing size of network)
use bigger dropout. You can set it with dropout = learn.model.children()[idx_dropout] dropout.p = 0.5
gamma - bigger value encourages network to focus more on images with high loss
(gamma=4.0 was good for me)
alpha - (optional) list of importance of categories, have to sum to 1,
e.g. [0.25, 0.5, 0.25] means that second category is 2 times more important,
It’s helpful if you have unbalanced number of images for each category
Also you should look at the augmented images, if you did not already, to figure out if augmented images are really matched to your case https://docs.fast.ai/vision.transform.html
Thanks. My understanding for focal loss is that gamma focuses on the “hard examples” of each class, and alpha is to do class weighting (if there is an imbalance), is this correct?
Precisely, this is similar to CrossEntropy where examples with big and correct confidence cause a lower loss then examples with big incorrect confidence. In FocalLoss you have a parameter (gamma) which can tune How much lower will be this loss.
Besides all sort of data augmentation, you can also try Mixup and Label Smoothing. They are some special kinds of augmentation that help regularize the classification model.
These techniques are introduced in fastbook (link) and you can easily apply them with fastai callbacks.