Training looks good while frozen, unfrozen has chaotic losses

mossCoder · September 3, 2018, 8:11pm

I’ve been working through Lessons 1 and 2, substituting the default dataset to a dataset of lichen images. Classes are lichen genera.

I’ve been curious to try the strategy in the lesson 2 notebook, of sequentially increasing the image size. I start out at small sizes, 64, training with layers frozen and then unfrozen. Then I increase size to 128, repeating the training in both frozen and unfrozen states. The frozen epochs seem pretty stable:

However, after unfreezing all layers, the accuracy continues to increase, though the losses in the training and validation sets are very unstable:

Would anyone have any idea what is happening here? After epoch 0, it appears as though the model begins to overfit, but then in epoch 4 it jumps back to something that looks promising, and then again descends into overfit territory.

ouflepapi · September 3, 2018, 9:23pm

how many images do you have in your train and validation sets ?

mossCoder · September 3, 2018, 9:57pm

Training set has 19456 images and the validation set has 4864 images.

KarlH · September 3, 2018, 10:20pm

The losses look fine. Sure there are some jitters but overall the training and validation losses both decreased over the epochs. Your accuracy is still increasing a lot every epoch - I would say the model has room to improve further. I would try training more at a lower learning rate - that might help the model settle down a bit more.

One thing I would suggest is adding learn.bn_freeze(True) after learn.unfreeze() to keep the batchnorm values of the base model constant.

knesgood · September 4, 2018, 5:38pm

Correct me if I’m wrong, but I thought we only used bn_freeze(True) when we’re using a Resnet model higher than Resnet34. Am I mistaken?

mossCoder · September 5, 2018, 2:42am

I’ve had some success thanks to the encouragement here, and am approaching 83% accuracy. I’ve increased size sequentially, starting at 64, 128, 224, and 299, training a resnext101 architecture. I cycle through a frozen and unfrozen sequence for each. I’m curious to try greater sizes, though I’m definitely seeing diminishing returns at 299. The value of unfreezing and training the lower layers via differential learning rates seems to decrease with size increase. Does that resemble anyone else experience?

wyquek · September 26, 2018, 8:33am

After reading through many threads on this forum, I still don’t understand why one should learn.bn_freeze(True) after learn.unfreeze() during finetuning. We are still training the NN, and the activations (or layer inputs) are still been normalized, so the NN should still be allowed to learn the batchnorm parameters so prevent “covariate shift”.

Why learn.bn_freeze(True)?