Using VGG16 on invasive species kaggle competition

I have finished lessons 1 and 2 and am trying to apply what I learned on the Invasive Species competition on kaggle but I am stuck at approximately 62% validation accuracy.

To get some practice, I attempted to rewrite the vgg16 class. This is what I have:

import json
import numpy as np
from keras.models import Sequential
from keras.layers.convolutional import Conv2D, ZeroPadding2D, MaxPooling2D
from keras.layers.core import Flatten, Dropout, Dense, Lambda
from keras.utils.data_utils import get_file
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import RMSprop

vgg_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((3,1,1))
def vgg_preprocess(image):
    image = image - vgg_mean
    return image[:, ::-1]

class Vgg16:
    def __init__(self):
        self.PATH = ''

    def ConvBlock(self, layers, filters):
        model = self.model
        for i in range(layers):
            model.add(ZeroPadding2D(padding=(1, 1)))
            model.add(Conv2D(filters, (3, 3), activation='relu'))
        model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))

    def FullyConnectedBlock(self):
        model = self.model
        model.add(Dense(4096, activation='relu'))

    def create(self):
        model = self.model = Sequential()
                input_shape=(3, 224, 224),
                output_shape=(3, 224, 224)))

        self.ConvBlock(2, 64)
        self.ConvBlock(2, 128)
        self.ConvBlock(3, 256)
        self.ConvBlock(3, 512)
        self.ConvBlock(3, 512)

        model.add(Dense(1000, activation='softmax'))

                self.PATH + 'vgg16.h5', cache_subdir='models'))

    def get_classes(self):
        file_name = 'imagenet_class_index.json'
        with open(get_file(file_name, self.PATH + file_name, cache_subdir='models')) as f:
            class_dict = json.load(f)
        self.classes = [class_dict[str(i)][1] for i in range(len(class_dict))]

    def get_batches(self, path, gen=ImageDataGenerator(), shuffle=True, batch_size=32, class_mode='categorical'):
        return gen.flow_from_directory(path, target_size=(224, 224),
            class_mode=class_mode, shuffle=shuffle, batch_size=batch_size)

    def finetune(self, batches):
        num_classes = len(batches.class_indices)
        for layer in self.model.layers:
            layer.trainable = False
        self.model.add(Dense(num_classes, activation='softmax'))
            loss='categorical_crossentropy', metrics=['accuracy'])
        classes = list(iter(batches.class_indices))
        for c in batches.class_indices:
            classes[batches.class_indices[c]] = c
        self.classes = classes

    def fit(self, batches, steps_per_epoch, epochs=1, validation_data=None, validation_steps=None):
        return self.model.fit_generator(batches,
            steps_per_epoch=steps_per_epoch, epochs=epochs,
            validation_data=validation_data, validation_steps=validation_steps)

I attempted to repeat the steps used for the dogs and cats competition in my notebook:

When I use the class above to run the vgg code from lesson 1, it seems to do fine.

I feel like there is something big that I have missed. Any help is appreciated.

Your model isn’t able to overfit on the training set (high loss). As Jeremy would probably say, first try to overfit on your training set. Personally, I would take a look first to the learning rate (lr) that looks presumably high (lr=0.1). And while lowering your learning rate, since each epoch is quite fast (28 sec), try to train a bit longer to see the trend. If you eventually can overfit your training set, than work on generalisation to improve the result on the validation set (data augmentation, normalisation, dropout, and many more)

@alexandrecc Thanks for the reply. I decreased the learning rate to 0.01 but the accuracy only got worse. At 0.01 and training for 30 epochs, it gets about 33% accuracy. I tried training with a learning rate of 0.1 and 1 also but for both learning rates the accuracy oscillates between 62% and 64%.

Interestingly enough, when I changed steps_per_epoch=math.floor(...) to steps_per_epoch=math.ceil(...) my accuracy went up to 85%. Could it be that I am not using steps_per_epoch correctly?

Edit: I played with different values for steps_per_epoch but that didn’t change much.

vgg16 was trained on images that are quite different from this dataset. you might try something else or try to retrain some convolutional layers (#61 on leaderboard here).

I didn’t have much success with vgg16 but i will probably use it soon.

1 Like

@aaronwong I finally got time to try it by myself (#9 on leaderboard here).
Here are some hints to upgrade your accuracy:

  • The dataset is quite different from Imagenet (probably more noise and higher resolution) : train some higher (or almost all) convolutional layers
  • Use a lot of data augmentation with Keras ImageDataGenerator to prevent overfitting of conv layers
  • For the output layer, use Sigmoid function instead of Softmax since it is a binary classifier problem
  • Try to maximize your image resolution as much as your GPU allows it because the features can potentially be small in the image. I could train up to 450x450 pixels with a GTX 1070
  • To maximize the feature detection, use multiple crops to improve the final prediction

Thanks for the responses. They have been really helpful. I have been able to increase my training accuracy to about 99% with the validation accuracy hovering right below 90% by slowly finetuning all of the convolutional layers. Everything I do after that seems to either decrease my validation accuracy or have no effect.

From my understanding, this is a case of my model overfitting the training data. I tried adding data augmentation with ImageDataGenerator as recommended but that seems to only decrease my training accuracy. Validation accuracy continues to hover at ~88%.

@alexandrecc Can you elaborate on what you mean by using multiple crops and training with higher resolution images?

I suspect you are talking about updating

            input_shape=(3, 224, 224),
            output_shape=(3, 224, 224)))

to train with higher resolution but I am not sure. I tried playing with that but I am getting weird errors that I am currently trying to debug. I know that Jeremy talks about this specific line of code in lesson 3 so I will go rewatch that to see if I can gain anymore insight as to what exactly this line is doing.

You can probably assume that you are able to overfit the training dataset (good job, first step done!). Now, with images size of 224x224, and data augmentation using ImageDataGenerator, you should be able to get about 0.985 validation/test accuracy with a properly trained VGG network. Assuming that your code is working, tweaking the data augmentation parameters can probably improve the valid/test accuracy. You need to augment in a meaningful way without losing the representation of the image.

With 450x450 resolution, you can get probably around 0.989 but the training is a bit unstable. With InceptionV3 or Xception you can get around 0.993. Multiple crops of the original image and average ensembling of the prediction results from different networks and crops got me around 0.995. I can probably get a bit higher with random forest ensembling or XGBoost.

I managed to get the validation accuracy to hover around ~95% with no data augmentation. To do this, I removed the dropout layer from the full connected layers.

After that, it seems that no matter how I tweak the data generator it only decreases my training accuracy and at best it doesn’t do anything to my validation accuracy. Most of the times, it also decreases my validation accuracy.

Hey Alexandre, Could you provide a bit more clarification on “Try to maximize your image resolution as much as your GPU allows”.

Since VGG only accepts 224x224 images how do people typically go about sending larger images through this type of network?

The pretrained VGG model on Imagenet was trained with 224x224 images. But you can input different image sizes since the pretrained convolution filters are applied to the entire image. Higher resolutions images take more GPU memory so you need to change the batch size.

Thanks Alexandre. After playing with this a bit more it started to make some intuitive sense. I removed all of the top layers and then recreated those layers and trained them with all the other layer’s weights frozen. Then I re-enabled training for the top convolution layers and began to see better performance.

@danielhavir @nickrobinson @aaronwong @dukeofyork

I just wrote a short overview of my 3rd place solution (LB 0.99643) for this Kaggle invasive species competition :

twitter link :

Let me know if you have any comment or question.

Thanks again @jeremy and @rachel to help democratizing deep learning ! You have incredible impact with this course !


Thanks for sharing. The trick with the cropping of the test images and averaging of the global prediction is brilliant!