Batch norm

Why do we perform batch norm before ReLU? Why not after it. Any specific reason? ReLU will change the activations that we get from convolution, but does that matter?

In the fastai code base right now, RELU activation is used before batchnorm, not after. There isn’t a definitive answer for this, but in practice it seems to be better. It’s a while ago but here is Jeremy take on it: Questions about batch normalization