You want the batchnorm after the non-linearity, and before the dropout.
Using batchnorm in RNNs requires care. This is an active area of recent research. I’m not aware of situations where batchnorm hurts CNNs.
You want the batchnorm after the non-linearity, and before the dropout.
Using batchnorm in RNNs requires care. This is an active area of recent research. I’m not aware of situations where batchnorm hurts CNNs.