I’ve been fiddling around with both Keras(TF) and FastAI(Pytorch) in recent weeks, and ran across this article on BatchNorm being “broken” in Keras:
Ultimately, the bug fix wasn’t merged to the main code, as there appears to be some disagreement of how BatchNorm layers should work when frozen. The issue is exacerbated during Transfer Learning, and definitely a number of people are reporting similar issues. There’s also a keras-resnet repo (see https://github.com/keras-team/keras/issues/9214, comment from XupingZHENG) that implements a similar fix.
Basically the claim is that the Keras version of BatchNorm allows the params of the BN layers to learn even when it’s frozen, and that’s ‘wrong’. For many academic work, it’s not an issue because the mean/variance of the pre-trained data and the fine-tuning data is similar. But if your datasets are not, this could be an issue because the frozen ReLUs in the pre-trained network were trained wrt to the pre-trained data mean/variance, and so if your BN layer params are being adjusted when the entire network is frozen, that’s going to be problematic.
I was wondering what others in the fastai community think about this, and more importantly, when we freeze BN layers in fastai, are they really frozen? Are there any best practices on how exactly to handle BN layers when doing transfer learning, and the overlap between the pre-trained and fine-tuned datasets is not good?