Lesson 2 Using Batch normalization after non linearity or before non linearity

It looks like there is no consensus yet on where to best place the BatchNorm layer. There is another thread about it here: Questions about batch normalization