BatchNormalization(axis = 1) when used on convolutional layers

rcc · October 17, 2018, 5:36am

Which axis should I use if I have the following architecture?

input_shape: (30, 64, 64) # This means short video files of 30 gray-scale images. I manage the 30 gray-scale images as the channels of the network. This is a must for my problem.

Conv2D(channels_first)
BatchNormalization
Activation(‘relu’)

Flatten
Dense
BatchNormalization
Activation(‘softplus’)

I’m using generators with a batch size of 1024 due to memory problems, so each epoch is trained with (1024, 30, 64, 64) mini batches.

My question is: should I use the axis=1 in the two first BatchNormalization layers and axis=-1 in the last one?

Thank you in advance.