I’ve read through all of the other posts here regarding poor Kaggle scores but have not been able to determine why my submission is scored so poorly. My “best” submission to date is: 0.19143 (rank 784).
I have hand-verified every step including looking through the first 75 images in the test set to ensure that the results recorded in my csv are correct (only a few were predicted incorrectly). I’ve watched Jeremy’s solution (the beginning of the Lesson 2 video) several times. I believe I am following his steps exactly.
Here are the steps I’m taking. Do you see any obvious mistakes? Thank you in advance.
batches = vgg.get_batches(path+'train', batch_size=batch_size)#, shuffle=False)
val_batches = vgg.get_batches(path+'validation', batch_size=batch_size*2)#, shuffle=False)
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=1)
Found 23000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Epoch 1/1
23000/23000 [==============================] - 197s - loss: 0.1209 - acc: 0.9690 - val_loss: 0.0651 - val_acc: 0.9845
batches, predictions = vgg.test(path+'test', batch_size = batch_size * 2)
predictions[:5]
array([[ 1. , 0. ],
[ 0.9874, 0.0126],
[ 1. , 0. ],
[ 0. , 1. ],
[ 1. , 0. ]], dtype=float32)
filenames = batches.filenames # full path to all test images
filenames[:5]
['unknown/84.jpg',
'unknown/12453.jpg',
'unknown/3841.jpg',
'unknown/8713.jpg',
'unknown/478.jpg']
ids = # convert filenames into ids
ids[:5]
[84, 12453, 3841, 8713, 478]
isdog = predictions[:, 1]
isdog[:5]
array([ 0. , 0.0126, 0. , 1. , 0. ], dtype=float32)
subm = np.stack([ids, isdog], axis=1)
subm[:5]
array([[ 84. , 0. ],
[ 12453. , 0.0126],
[ 3841. , 0. ],
[ 8713. , 1. ],
[ 478. , 0. ]])
np.savetxt('submission.csv', subm, fmt='%d,%.5f', header='id,label', comments='')