Lesson 2 Using Batch normalization after non linearity or before non linearity

msp · August 16, 2017, 7:49pm

It looks like there is no consensus yet on where to best place the BatchNorm layer. There is another thread about it here: Questions about batch normalization