Good readings 2019

fabris · May 8, 2019, 10:48am

How Does Batch Normalization Help Optimization?

Batch Normalization controls the change of the layers’ input distributions during training to reduce the so-called internal covariate shift. The popular belief is that its effectiveness stems from that. However, the authors demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, they uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother.