Which axis should I use if I have the following architecture?
input_shape: (30, 64, 64) # This means short video files of 30 gray-scale images. I manage the 30 gray-scale images as the channels of the network. This is a must for my problem.
Conv2D(channels_first)
BatchNormalization
Activation(‘relu’)
Conv2D(channels_first)
BatchNormalization
Activation(‘relu’)
Flatten
Dense
BatchNormalization
Activation(‘softplus’)
I’m using generators with a batch size of 1024 due to memory problems, so each epoch is trained with (1024, 30, 64, 64) mini batches.
My question is: should I use the axis=1 in the two first BatchNormalization layers and axis=-1 in the last one?
Thank you in advance.