Can batchnorm work with non-random minibatches?

Pomo · August 22, 2020, 6:42am

Maybe someone with a deeper understanding can unravel this situation.

I am using resnet with images of very different widths, highly skewed to the smaller. The training loop runs through the entire epoch (whole set of images) and then does the weight update. By first sorting the images by width and grouping into minibatches of similar width, running an epoch is much faster. For example, n minibatches with images of average width 200 instead of n minibatches with images all of width 1000.

However, this method does not seem to play well with batchnorm. And I don’t understand why. When I do not sort the minibatches by image width, training works at the expense of much more time. With minibatch sorting, when I put all the batchnorms into eval mode, it also trains. If not, losses training and validation losses are very different.

Can anyone explain what is going on? And is there any hope of using this idea with resnet?

Also, I can’t resize all images to the same width because width represents time and the time scale matters for classification.

Thanks for any clarification!

Pomo · August 25, 2020, 8:25pm

Hi again. I think this was a false alarm, sorry.

It appears that the culprit was Dropout which appears in the head that fastai by default attaches to resnet. Once I set p=0 for all Dropouts, resnet could at least memorize and overfit the training set.

I don’t have time to dig deeper into exactly what was going wrong. But it brings up the question: if a model is not training, is the reason that the architecture is inadequate, or that Dropout is too high? I suppose you should always check with p=0 when you see no learning of the training set. Obvious in hindsight.