Pre-trained network too different data domain and very few labeled data?

I’ve been playing around with the fastai one-cycle learning rate policy for the last week or so, and I can’t seem to get the model to fit under the situation where:

(i) the pre-trained network is trained on too different a data domain, i.e. my target application is not very ImageNet like and (ii) I have very little labeled data for the target application, i.e. 1000+ data points for each class for a binary classification. The frozen network trains but the metrics are very poor. An unfrozen network immediately overfits (val and train loss diverges).

Does anyone have any advice on what settings to adjust? I already did the lr_find stuff. So far I have:

  1. Used dropout of 0.25 for features coming from the pre-trained network
  2. Switched the dense layer to 1x1 convolution with 128 filters (so less parameters compared to the dense layer of 512 neurons)
  3. Played with train_bn settings between T/F, and also added/remove bn layers in my custom head
  4. More epochs seems to make it just overfit more
  5. Unfrozen network immediately overfits, even reducing backbone lr to lr/100 doesn’t help much
  6. Already using standard data augmentation methods

I had similar case, where I had two categories unlike imagenet. Training with pre-trained network worked for me better then training from scratch. With freezed and next unfreezed weights.

What comes more to my mind:

  1. EDITED increase weight decay (instead of decreasing size of network)

  2. use bigger dropout. You can set it with
    dropout = learn.model.children()[idx_dropout]
    dropout.p = 0.5

  3. use focalloss with tuned parameters. I created a gist to use it with fastai
    https://gist.github.com/KornelDylski/53495e62b72bb277f55e5498eb020ef4

    • gamma - bigger value encourages network to focus more on images with high loss
      (gamma=4.0 was good for me)
    • alpha - (optional) list of importance of categories, have to sum to 1,
      e.g. [0.25, 0.5, 0.25] means that second category is 2 times more important,
      It’s helpful if you have unbalanced number of images for each category

Also you should look at the augmented images, if you did not already, to figure out if augmented images are really matched to your case https://docs.fast.ai/vision.transform.html

3 Likes

Thanks. My understanding for focal loss is that gamma focuses on the “hard examples” of each class, and alpha is to do class weighting (if there is an imbalance), is this correct?

Yes

Precisely, this is similar to CrossEntropy where examples with big and correct confidence cause a lower loss then examples with big incorrect confidence. In FocalLoss you have a parameter (gamma) which can tune How much lower will be this loss.

Check out top-left graph https://arxiv.org/pdf/1708.02002.pdf
blue line (gamma=0) is just basic CrossEntropy

PS If FocalLoss will help you too, you can note it here :wink:

1 Like

Besides all sort of data augmentation, you can also try Mixup and Label Smoothing. They are some special kinds of augmentation that help regularize the classification model.
These techniques are introduced in fastbook (link) and you can easily apply them with fastai callbacks.