CNN in keras overfitting even after dropout, batch normalization and augmentation

AbhishekSunny · June 13, 2018, 5:22pm

I am trying to train my model by finetuning a pretrained model(vggface) (like facenet). My model has 12 classes with 1774 training images and 313 validation images, each class having around 150 images.
My model was overfitting so I added dropout and FC layers with batch normalization to see how it goes. But still, the model overfits.
I am a beginner and I have been trying this for a couple of days. I could only get upto 79% val. accuracy with training accuracy around 99%. The model is overfitting and I can’t seem to find the reason where I am going wrong. Any help would be appreciated.
Some changes I have tried in my code below:

increasing and decreasing dropout.
changing optimizer to rmsprop, but got even worse results
decreasing the learning rate, it took like 35 epochs to cross 15% accuracy.
I have augmented already to balance my dataset.

I am not sure what can be done now.
Here’s my last 5 epochs for the code below. I even got 100% training accuracy a couple of times while trying different additions/deletions but not increase in val. accuracy.

Epoch 95/100
33/55 [================>.............] - ETA: 18s - loss: 0.8122 - acc: 0.9351 - f1: 0.656956/55 [==============================] - 53s 952ms/step - loss: 0.7908 - acc: 0.9383 - f1: 0.6683 - val_loss: 0.9396 - val_acc: 0.7827 - val_f1: 0.7373
Epoch 96/100
56/55 [==============================] - 53s 949ms/step - loss: 0.7675 - acc: 0.9440 - f1: 0.6815 - val_loss: 0.9098 - val_acc: 0.7955 - val_f1: 0.7457
Epoch 97/100
56/55 [==============================] - 53s 950ms/step - loss: 0.7794 - acc: 0.9380 - f1: 0.6755 - val_loss: 0.9200 - val_acc: 0.7923 - val_f1: 0.7308
Epoch 98/100
56/55 [==============================] - 53s 950ms/step - loss: 0.7805 - acc: 0.9452 - f1: 0.6697 - val_loss: 0.8790 - val_acc: 0.8051 - val_f1: 0.7606
Epoch 99/100
33/55 [================>.............] - ETA: 18s - loss: 0.7559 - acc: 0.9471 - f1: 0.686556/55 [==============================] - 53s 955ms/step - loss: 0.7515 - acc: 0.9499 - f1: 0.6883 - val_loss: 0.9150 - val_acc: 0.7700 - val_f1: 0.7591
Epoch 100/100
56/55 [==============================] - 53s 945ms/step - loss: 0.7189 - acc: 0.9537 - f1: 0.7055 - val_loss: 0.8576 - val_acc: 0.7987 - val_f1: 0.7712



`img_width, img_height = 224, 224`

    vggface = VGGFace(model='resnet50', include_top=False, input_shape=(img_width, img_height, 3))

    #vgg_model = VGGFace(include_top=False, input_shape=(224, 224, 3))

    last_layer = vggface.get_layer('avg_pool').output
    x = Flatten(name='flatten')(last_layer)
    xx = Dense(128, activation = 'softmax')(x)
    x1 = BatchNormalization()(xx)
    x2 = Dropout(0.5)(x1)
    y = Dense(128, activation = 'softmax')(x2)
    yy = BatchNormalization()(y)
    y1 = Dropout(0.5)(yy)
    x3 = Dense(12, activation='softmax', name='classifier')(y1)

    custom_vgg_model = Model(vggface.input, x3)


    # Create the model
    model = models.Sequential()
     
    # Add the convolutional base model
    model.add(custom_vgg_model)

    train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=20,
      width_shift_range=0.2,
      height_shift_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')
     

    validation_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=20,
      width_shift_range=0.2,
      height_shift_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')
     
    # Change the batchsize according to your system RAM
    train_batchsize = 32
    val_batchsize = 32
     
    train_generator = train_datagen.flow_from_directory(
        train_data_path,
        target_size=(img_width, img_height),
        batch_size=train_batchsize,
        class_mode='categorical')
     
    validation_generator = validation_datagen.flow_from_directory(
        validation_data_path,
        target_size=(img_width, img_height),
        batch_size=val_batchsize,
        class_mode='categorical')

    # Compile the model
    model.compile(loss='categorical_crossentropy',
              optimizer=optimizers.SGD(lr=1e-3),
              metrics=['acc'])
    # Train the model
    history = model.fit_generator(
      train_generator,
      steps_per_epoch=train_generator.samples/train_generator.batch_size ,
      epochs=100,
      validation_data=validation_generator,
      validation_steps=validation_generator.samples/validation_generator.batch_size,
      verbose=1)
     
    # Save the model
    model.save('facenet_128.h5')

snagpaul · June 13, 2018, 5:59pm

You could try these 6 steps:
http://wiki.fast.ai/index.php/Over-fitting

add more data - Probably not an option for you given the fixed dataset.
use data augmentation - Has worked fantastically in the past. If it makes sense to transform images and retain labels for your dataset, this is a great way to reduce overfitting.
use batch normalisation - You have this already.
use architectures that generalise well - You did this a little bit. You probably don’t want to touch this because you got a good model that can memorize.
add regularisation (L1, L2, dropout)
reduce architecture complexity

AbhishekSunny · June 13, 2018, 6:05pm

Oh sorry, I forgot to add that I did augmentation to balance my dataset which was previously unbalanced. I applied on some of the minority classes.
Is there a way to reduce the architecture compexity, given I am finetuning my model.
And where could I apply regularization? I have already added dropout.
@ snagpaul

snagpaul · June 13, 2018, 6:23pm

While you are experimenting, try this too:

An approach where you don’t selectively balance. Let the distributions of classes remain the same and assume that the training and testing sets have similar distributions.
Then, augment all the data together.

2.You could reduce complexity by cutting off the number of vgg layers in play. You could retrain some of the final layers. You could use fewer full connected layers.

edit:
Just wanted to let you know that I haven’t scrolled through all the code. Hope this helps!

AbhishekSunny · June 13, 2018, 8:21pm

Just wanted to let you know that I haven’t looked at all the new code you edited into your original question.

No. I did not edit anything in my code.
Thank you for replying.

Nel · January 31, 2019, 4:48pm

Hello,

Actually I am working in the same area. But my issue is that my dataset (FER2013) images have the size of 48x48x3 and I can’t fine-tune the vggface because it’s input shape is 224x224x3. Do you have any idea?

Any help would be appreciated!

gabrieldev · July 28, 2019, 3:35am

Hi! Did you get anything?