In the lecture and resnet-mnist notebook, Jeremy’s basic CNN example uses Conv2D, BatchNorm, ReLU sequences. Then the refactor uses fastai’s conv_layer, which wraps a Conv2D, ReLU, BatchNorm sequence.
Is the order of the ReLU and BatchNorm operations not consequential?
I would have thought the composition of the sequence matters, especially the location of the ReLU. Further, I’d have thought that BatchNorm on the raw Conv2D weights rather than on the ReLU-truncated weights would make more sense, in accordance with the basic CNN example, rather than with the conv_layer function.