Lesson 5: Batchnorm not giving good results

I tried to use BatchNormatization in the simple CNN architecture of sentiment analysis problem.
My code is

model = Sequential([
    Embedding(vocab_size,32,input_length=size),
    Dropout(0.2),
    Conv1D(64,5,border_mode='same',activation='relu'),
    BatchNormalization(),
    MaxPooling1D(),
    Flatten(),
    Dense(128,activation='relu'),
    BatchNormalization(),
    Dense(1,activation='sigmoid')])

while the one given in Jeremy’s notebook is -

conv1 = Sequential([
    Embedding(vocab_size, 32, input_length=seq_len, dropout=0.2),
    Dropout(0.2),
    Convolution1D(64, 5, border_mode='same', activation='relu'),
    Dropout(0.2),
    MaxPooling1D(),
    Flatten(),
    Dense(100, activation='relu'),
    Dropout(0.7),
    Dense(1, activation='sigmoid')])

and with my code I only get an accuracy of 86% while with Jeremy’s one I get around 89-90%.
Why is this so?
Shouldn’t BatchNormalization improve my results?

This doesn’t directly answer your question, but please look at section seven here. I believe it likely you are overfitting your input data considerably, but without looking at your LR and accuracy optimization curves, it’ll be hard to nail down. BN helps by to balance the internal covariance shift between layers to help training not have to adapt as hard when parameters such as the LR are altered. While it lessens the need for a high dropout, in and of itself, it is not a drop-in replacement for it.

You only want batch normalization in the channel dimension, so need to add axis=-1 to BN (assuming TF orientation).

1 Like