Using VGG16 on Kaggle's Plant Seedlings Classification Competition

Hi. I just finished lesson 1 and 2 from part 1 of Now, I try to use the VGG16 model on plant seedlings classification competition on Kaggle. But I got stuck with 6% on accuracy. It seems like I missed something. Any help is appreciated.

Machine that I use

I use my own laptop with specs: Intel® Core™ i5-6200U CPU @ 2.30GHz, NVIDIA GeForce 930M with 2GB of GPU memory, and 12GB of RAM.


My work can be seen here.


I didn’t use all data. First, I took 1000 images that consist of 12 classes from the train set for valid set. After that, I created a sample set of 500 images for the train set and 100 images for the valid set. Here is the number of images from each class in each set.

Train set:
Black-grass : 206
Charlock : 310
Cleavers : 232
Common Chickweed : 490
Common wheat : 177
Fat Hen : 365
Loose Silky-bent : 502
Maize : 164
Scentless Mayweed : 404
Shepherds Purse : 187
Small-flowered Cranesbill : 401
Sugar beet : 312

Valid set:
Black-grass : 57
Charlock : 80
Cleavers : 55
Common Chickweed : 121
Common wheat : 44
Fat Hen : 110
Loose Silky-bent : 152
Maize : 57
Scentless Mayweed : 112
Shepherds Purse : 44
Small-flowered Cranesbill : 95
Sugar beet : 73

From sample set(the following is the data that I use):
Train set:
Black-grass : 21
Charlock : 48
Cleavers : 28
Common Chickweed : 53
Common wheat : 31
Fat Hen : 49
Loose Silky-bent : 66
Maize : 18
Scentless Mayweed : 61
Shepherds Purse : 23
Small-flowered Cranesbill : 57
Sugar beet : 45

Valid set:
Black-grass : 6
Charlock : 12
Cleavers : 3
Common Chickweed : 11
Common wheat : 5
Fat Hen : 10
Loose Silky-bent : 19
Maize : 7
Scentless Mayweed : 11
Shepherds Purse : 3
Small-flowered Cranesbill : 9
Sugar beet : 4


Basically, I followed the model from scripts in repo.

def ConvBlock(layers, model, filters):
    for i in range(layers):
        model.add(Convolution2D(filters, 3, 3, activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

def FCBlock(model):
    model.add(Dense(4096, activation='relu'))

vgg_mean = np.array([123.68, 116.779, 103.939]).reshape((3,1,1))

def vgg_preprocess(x):
    x = x - vgg_mean
    return x[:, ::-1]

def VGG_16():
    model = Sequential()
    model.add(Lambda(vgg_preprocess, input_shape=(3,224,224)))
    ConvBlock(2, model, 64)
    ConvBlock(2, model, 128)
    ConvBlock(3, model, 256)
    ConvBlock(3, model, 512)
    ConvBlock(3, model, 512)
    model.add(Dense(1000, activation='softmax'))
    return model


This is the first result after I fine-tune the last layer to match my needs. I use 10 epochs and batch size = 4.

Epoch 1/10
500/500 [==============================] - 68s - loss: 15.4570 - acc: 0.0360 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 2/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 3/10
500/500 [==============================] - 65s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 4/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 5/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 6/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 7/10
500/500 [==============================] - 67s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 8/10
500/500 [==============================] - 70s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 9/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 10/10
500/500 [==============================] - 67s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600

And this is my result after I train all the dense layers. Still no change.

Epoch 1/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 2/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 3/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 4/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 5/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 6/10
500/500 [==============================] - 64s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 7/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 8/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 9/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600
Epoch 10/10
500/500 [==============================] - 63s - loss: 15.4411 - acc: 0.0420 - val_loss: 15.1510 - val_acc: 0.0600

I can see that loss is not changing at all. Check the value of learning rates . Further can you provide more details regarding the Optimizer you are using.

@gokkulnath Thanks! I’m using sgd as my optimizer. You’re right, I used a too big learning rate (0.1). Now I changed it to 0.001 and here is the result. The result overfits but I think that’s a good start (a bump from 6% to 15%).

Epoch 1/10
500/500 [==============================] - 66s - loss: 1.3990 - acc: 0.7300 - val_loss: 6.7164 - val_acc: 0.0400
Epoch 2/10
500/500 [==============================] - 63s - loss: 1.2736 - acc: 0.7200 - val_loss: 6.2432 - val_acc: 0.0400
Epoch 3/10
500/500 [==============================] - 63s - loss: 1.1720 - acc: 0.7380 - val_loss: 5.4563 - val_acc: 0.0700
Epoch 4/10
500/500 [==============================] - 64s - loss: 1.0821 - acc: 0.7400 - val_loss: 4.7643 - val_acc: 0.0700
Epoch 5/10
500/500 [==============================] - 64s - loss: 0.9818 - acc: 0.7400 - val_loss: 4.6014 - val_acc: 0.1200
Epoch 6/10
500/500 [==============================] - 64s - loss: 0.9324 - acc: 0.7340 - val_loss: 4.8431 - val_acc: 0.1100
Epoch 7/10
500/500 [==============================] - 64s - loss: 0.9442 - acc: 0.7300 - val_loss: 4.6818 - val_acc: 0.1200
Epoch 8/10
500/500 [==============================] - 67s - loss: 0.9094 - acc: 0.7680 - val_loss: 4.6274 - val_acc: 0.1500
Epoch 9/10
500/500 [==============================] - 67s - loss: 0.8416 - acc: 0.7660 - val_loss: 4.6469 - val_acc: 0.1300
Epoch 10/10
500/500 [==============================] - 63s - loss: 0.8447 - acc: 0.7500 - val_loss: 4.5103 - val_acc: 0.1500

Note: Actually I remember that I used that learning rate (0.001) before, but it weirdly result the same as my first post. Maybe I didn’t check my code properly. (Apparently before this, I used Adam as my optimizers. That’s explain why the result kind of the same with a learning rate of 0.001).

Update: Now here is the result after I train all the dense layers. Accuracy increased from 15% to 33%.

Epoch 1/10
500/500 [==============================] - 64s - loss: 0.8541 - acc: 0.7480 - val_loss: 2.9999 - val_acc: 0.3300
Epoch 2/10
500/500 [==============================] - 64s - loss: 0.8475 - acc: 0.7520 - val_loss: 3.0623 - val_acc: 0.3100
Epoch 3/10
500/500 [==============================] - 67s - loss: 0.7951 - acc: 0.7840 - val_loss: 2.9582 - val_acc: 0.3400
Epoch 4/10
500/500 [==============================] - 66s - loss: 0.7588 - acc: 0.7780 - val_loss: 2.7526 - val_acc: 0.3100
Epoch 5/10
500/500 [==============================] - 65s - loss: 0.7785 - acc: 0.7780 - val_loss: 2.7318 - val_acc: 0.3500
Epoch 6/10
500/500 [==============================] - 66s - loss: 0.7114 - acc: 0.7820 - val_loss: 2.8430 - val_acc: 0.3200
Epoch 7/10
500/500 [==============================] - 64s - loss: 0.7371 - acc: 0.7840 - val_loss: 2.8593 - val_acc: 0.3300
Epoch 8/10
500/500 [==============================] - 64s - loss: 0.7004 - acc: 0.7820 - val_loss: 2.6694 - val_acc: 0.3500
Epoch 9/10
500/500 [==============================] - 63s - loss: 0.6645 - acc: 0.7800 - val_loss: 2.4721 - val_acc: 0.3300
Epoch 10/10
500/500 [==============================] - 65s - loss: 0.6901 - acc: 0.7980 - val_loss: 2.5125 - val_acc: 0.3300

Please Do Try the Following :
Initialize the Weights using He or Xavier Initialization.
Add Dropout
Add BatchNorm Layers before CNN block. (Normalized Data works better)
Increase Batch Size (Bigger Batch Size has better generalization )
Use Adam or RMSprop (Widely Used and better results)

Since you are training from scratch it takes more iteration to converge.
Hope it Helps !


@gokkulnath Thank you! I’ll update here if I found any difficulties again.

I had similar difficulties trying to adapt lesson 2’s VGG16 code to the Dog Breeds Classification challenge on Kaggle. I had made all the important changes to convert from binary dogs-cats to multiclass breeds (that I could think of, at least), but I still had miserable accuracy and training took a whole day.

I found that the new fastai (2018) library is both orders of magnitude faster to train and more accurate. I went from an accuracy of 40% to 83%, and 24 hours train to 7s (!!!).