Jeremy snuck in another assignment for us, on the video for Lecture 3 here.
The question is, why do you need to specify axis = 1 as a parameter to the BatchNormalization method when using it in conjunction with a convolutional layer?
It looks like the reason is that we need to make sure that we do batch normalization on the channels themselves.
Apparently it is possible to do normalization along any dimension of the image!
So, if you set 1 as the value for the axis argument, then you are telling Keras will do batch normalization on the channels.
If you forget this, you would be instead using the argument -1. It’s not explicitly clear on the documentation what the argument -1 does – but I think it might go ahead and normalize your data by columns.
On the video Jeremy implies that this is a great opportunity to get a deep understanding of batch normalization – so maybe there’s more to the story?