Using VGG16 on Kaggle's Plant Seedlings Classification Competition

Hi. I just finished lesson 1 and 2 from part 1 of fast.ai. Now, I try to use the VGG16 model on plant seedlings classification competition on Kaggle. But I got stuck with 6% on accuracy. It seems like I missed something. Any help is appreciated.

Machine that I use

I use my own laptop with specs: Intel® Core™ i5-6200U CPU @ 2.30GHz, NVIDIA GeForce 930M with 2GB of GPU memory, and 12GB of RAM.

Notebook

My work can be seen here. https://nbviewer.jupyter.org/gist/arisbw/498c69744ec720ca359b0886f9e200d6

Data

I didn’t use all data. First, I took 1000 images that consist of 12 classes from the train set for valid set. After that, I created a sample set of 500 images for the train set and 100 images for the valid set. Here is the number of images from each class in each set.

"""
Train set:
Black-grass : 206
Charlock : 310
Cleavers : 232
Common Chickweed : 490
Common wheat : 177
Fat Hen : 365
Loose Silky-bent : 502
Maize : 164
Scentless Mayweed : 404
Shepherds Purse : 187
Small-flowered Cranesbill : 401
Sugar beet : 312

Valid set:
Black-grass : 57
Charlock : 80
Cleavers : 55
Common Chickweed : 121
Common wheat : 44
Fat Hen : 110
Loose Silky-bent : 152
Maize : 57
Scentless Mayweed : 112
Shepherds Purse : 44
Small-flowered Cranesbill : 95
Sugar beet : 73
"""

"""
From sample set(the following is the data that I use):
Train set:
Black-grass : 21
Charlock : 48
Cleavers : 28
Common Chickweed : 53
Common wheat : 31
Fat Hen : 49
Loose Silky-bent : 66
Maize : 18
Scentless Mayweed : 61
Shepherds Purse : 23
Small-flowered Cranesbill : 57
Sugar beet : 45

Valid set:
Black-grass : 6
Charlock : 12
Cleavers : 3
Common Chickweed : 11
Common wheat : 5
Fat Hen : 10
Loose Silky-bent : 19
Maize : 7
Scentless Mayweed : 11
Shepherds Purse : 3
Small-flowered Cranesbill : 9
Sugar beet : 4
"""

Model

Basically, I followed the model from scripts in fast.ai repo.

def ConvBlock(layers, model, filters):
    for i in range(layers):
        model.add(ZeroPadding2D((1,1)))
        model.add(Convolution2D(filters, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

def FCBlock(model):
    model.add(Dense(4096, activation='relu'))
    model.add(Dropout(0.5))

vgg_mean = np.array([123.68, 116.779, 103.939]).reshape((3,1,1))

def vgg_preprocess(x):
    x = x - vgg_mean
    return x[:, ::-1]

def VGG_16():
    model = Sequential()
    model.add(Lambda(vgg_preprocess, input_shape=(3,224,224)))
    
    ConvBlock(2, model, 64)
    ConvBlock(2, model, 128)
    ConvBlock(3, model, 256)
    ConvBlock(3, model, 512)
    ConvBlock(3, model, 512)
    
    model.add(Flatten())
    FCBlock(model)
    FCBlock(model)
    model.add(Dense(1000, activation='softmax'))
    return model

Results

This is the first result after I fine-tune the last layer to match my needs. I use 10 epochs and batch size = 4.

Epoch 1/10
500/500 [==============================] - 68s - loss: 15.4570 - acc: 0.0360 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 2/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 3/10
500/500 [==============================] - 65s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 4/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 5/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 6/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 7/10
500/500 [==============================] - 67s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 8/10
500/500 [==============================] - 70s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 9/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 10/10
500/500 [==============================] - 67s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600

And this is my result after I train all the dense layers. Still no change.

Epoch 1/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 2/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 3/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 4/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 5/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 6/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 7/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 8/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 9/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 10/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600

I can see that loss is not changing at all. Check the value of learning rates . Further can you provide more details regarding the Optimizer you are using.

@gokkulnath Thanks! I’m using sgd as my optimizer. You’re right, I used a too big learning rate (0.1). Now I changed it to 0.001 and here is the result. The result overfits but I think that’s a good start (a bump from 6% to 15%).

Epoch 1/10
500/500 [==============================] - 66s - loss: 1.3990 - acc: 0.7300 - val_loss: 6.7164 - val_acc: 0.0400
Epoch 2/10
500/500 [==============================] - 63s - loss: 1.2736 - acc: 0.7200 - val_loss: 6.2432 - val_acc: 0.0400
Epoch 3/10
500/500 [==============================] - 63s - loss: 1.1720 - acc: 0.7380 - val_loss: 5.4563 - val_acc: 0.0700
Epoch 4/10
500/500 [==============================] - 64s - loss: 1.0821 - acc: 0.7400 - val_loss: 4.7643 - val_acc: 0.0700
Epoch 5/10
500/500 [==============================] - 64s - loss: 0.9818 - acc: 0.7400 - val_loss: 4.6014 - val_acc: 0.1200
Epoch 6/10
500/500 [==============================] - 64s - loss: 0.9324 - acc: 0.7340 - val_loss: 4.8431 - val_acc: 0.1100
Epoch 7/10
500/500 [==============================] - 64s - loss: 0.9442 - acc: 0.7300 - val_loss: 4.6818 - val_acc: 0.1200
Epoch 8/10
500/500 [==============================] - 67s - loss: 0.9094 - acc: 0.7680 - val_loss: 4.6274 - val_acc: 0.1500
Epoch 9/10
500/500 [==============================] - 67s - loss: 0.8416 - acc: 0.7660 - val_loss: 4.6469 - val_acc: 0.1300
Epoch 10/10
500/500 [==============================] - 63s - loss: 0.8447 - acc: 0.7500 - val_loss: 4.5103 - val_acc: 0.1500

Note: Actually I remember that I used that learning rate (0.001) before, but it weirdly result the same as my first post. Maybe I didn’t check my code properly. (Apparently before this, I used Adam as my optimizers. That’s explain why the result kind of the same with a learning rate of 0.001).

Update: Now here is the result after I train all the dense layers. Accuracy increased from 15% to 33%.

Epoch 1/10
500/500 [==============================] - 64s - loss: 0.8541 - acc: 0.7480 - val_loss: 2.9999 - val_acc: 0.3300
Epoch 2/10
500/500 [==============================] - 64s - loss: 0.8475 - acc: 0.7520 - val_loss: 3.0623 - val_acc: 0.3100
Epoch 3/10
500/500 [==============================] - 67s - loss: 0.7951 - acc: 0.7840 - val_loss: 2.9582 - val_acc: 0.3400
Epoch 4/10
500/500 [==============================] - 66s - loss: 0.7588 - acc: 0.7780 - val_loss: 2.7526 - val_acc: 0.3100
Epoch 5/10
500/500 [==============================] - 65s - loss: 0.7785 - acc: 0.7780 - val_loss: 2.7318 - val_acc: 0.3500
Epoch 6/10
500/500 [==============================] - 66s - loss: 0.7114 - acc: 0.7820 - val_loss: 2.8430 - val_acc: 0.3200
Epoch 7/10
500/500 [==============================] - 64s - loss: 0.7371 - acc: 0.7840 - val_loss: 2.8593 - val_acc: 0.3300
Epoch 8/10
500/500 [==============================] - 64s - loss: 0.7004 - acc: 0.7820 - val_loss: 2.6694 - val_acc: 0.3500
Epoch 9/10
500/500 [==============================] - 63s - loss: 0.6645 - acc: 0.7800 - val_loss: 2.4721 - val_acc: 0.3300
Epoch 10/10
500/500 [==============================] - 65s - loss: 0.6901 - acc: 0.7980 - val_loss: 2.5125 - val_acc: 0.3300

Please Do Try the Following :
Initialize the Weights using He or Xavier Initialization.
Add Dropout
Add BatchNorm Layers before CNN block. (Normalized Data works better)
Increase Batch Size (Bigger Batch Size has better generalization )
Use Adam or RMSprop (Widely Used and better results)

Since you are training from scratch it takes more iteration to converge.
Hope it Helps !
~Gokkul

2 Likes

@gokkulnath Thank you! I’ll update here if I found any difficulties again.

I had similar difficulties trying to adapt lesson 2’s VGG16 code to the Dog Breeds Classification challenge on Kaggle. I had made all the important changes to convert from binary dogs-cats to multiclass breeds (that I could think of, at least), but I still had miserable accuracy and training took a whole day.

I found that the new fastai (2018) library is both orders of magnitude faster to train and more accurate. I went from an accuracy of 40% to 83%, and 24 hours train to 7s (!!!).