Iām trying to submit my first predictions on Dogs and Cats competition on Kaggle and for that I followed Lesson 2, essentially the parts that uses Vgg16 and manually fine-tunes it (dropping the last layer, and retraining a new dense final layer).
The only thing I change from the lesson2 notebook is actually using Adam() optimizer.
Fitting the model, I get for the first run, this :
First, I find it strange that my loss is actually going up when learning more from the second epochā¦ But overall losses are reasonable (0.2ish is a good starting point I guess).
And when I submit my predictions (using predict_generator() which Iām afraid makes troubles but I have memory issues so thought it was my only solution), my Kaggle loss is very poor : 1.51952.
Is there something obvious Iām missing here in the pipeline? Hopefully you can give me some hints.
Youāre shuffling. Make sure shuffle = False in flow_from_directory for your prediction generator.
You should also make sure the file names match up.
If your generator is named test, then the ids should be in order in test.filenames. Then the corresponding labels would be labels = model.predict_generator(test, test.n)[:,1].
Youāre not clipping. labels = np.clip(labels, 0.01,0.99)
Do a sanity check before submitting - check images to make sure your predictions make sense, most should be right! You can also look at the images youāre most unsure of, e.g. abs(labels - 0.5).argsort()[:10] should be the indices of the ten most uncertain images (assuming labels is a vector of soft max output for dogs).
Thanks again for your help. Didnāt realize the competition ended yesterday, but making sure there was shuffle=False of course gave more realistic predictions ! Loss of 0.10613 with that model, fair enough for 10 lines
Hi, Iām facing a similar problem, of poor performance on Kaggle for Lesson 1.
Summary of my issue:
I downloaded the ādogs-vs-cats-redux-kernels-editionā data and rebuilt the VGG16 model by finetune and fit commands for the new data/classes. Next, I run predict method on the test images (see code below).
When I upload the test prediction results, the āPublic Scoreā is 0.66868. The Public Leaderboard has a best score of 0.03. This suggests that my results are pretty bad.
When the model was finetuning and fitting, the val_acc was 0.98 and val_loss was 0.1, suggesting that the model was pretty good with validation data.
Why is my ātestā āPublic Scoreā so bad? Iāve spot checked about 25 images, of them 24 are are correctly predicted except for 1 image.
As per this thread, shuffle should be false. I use utils.get_data to load test images which has shuffle=false already.
Are there any other ideas?
My code for model fitting and prediction
model = Vgg16()
batches = model.get_batches(path+ātrainā, batch_size=batch_size)
val_batches = model.get_batches(path+āvalidā, batch_size=batch_size*2)
model.finetune(batches)
model.fit(batches, val_batches, nb_epoch=3)
# get test images
test_data = utils.get_data(path+ātestā)
# run prediction on test images
[preds, idxs, classes] = model.predict(test_data)
# change the predicted āclassesā output to kaggle submission format
classes = [c.replace(ādogsā, ā1ā) for c in classes]
classes = [c.replace(ācatsā, ā0ā) for c in classes]
# get the filenames from test folder
_, _, filenames = next(os.walk(path+ātest/unknownā))
filenames = [f.replace(ā.jpgā, āā) for f in filenames]
# join the test image filenames and their predictions
output = np.column_stack((filenames, classes))
I further investigated the above issue and figured the reason for my poor kaggle score.
The rules for submission states that āFor each image in the test set, you should predict a probability that the image is a dog (1 = dog, 0 = cat).ā
In my submission to kaggle, I was not uploading the probably of a dog. But rather just 1 or 0 if the image is a dog or cat respectively. Once I changed the code, to upload probability of a dog, my āPublic Scoreā improved to 0.17
Still need to work on it, to improve the Score, but at least getting better.