Hi, I'm facing a similar problem, of poor performance on Kaggle for Lesson 1.
Summary of my issue:
1. I downloaded the "dogs-vs-cats-redux-kernels-edition" data and rebuilt the VGG16 model by finetune and fit commands for the new data/classes. Next, I run predict method on the test images (see code below).
2. When I upload the test prediction results, the "Public Score" is 0.66868. The Public Leaderboard has a best score of 0.03. This suggests that my results are pretty bad.
3. When the model was finetuning and fitting, the val_acc was 0.98 and val_loss was 0.1, suggesting that the model was pretty good with validation data.
4. Why is my 'test' "Public Score" so bad? I've spot checked about 25 images, of them 24 are are correctly predicted except for 1 image.
As per this thread, shuffle should be false. I use utils.get_data to load test images which has shuffle=false already.
Are there any other ideas?
My code for model fitting and prediction
model = Vgg16()
batches = model.get_batches(path+"train", batch_size=batch_size)
val_batches = model.get_batches(path+"valid", batch_size=batch_size*2)
model.fit(batches, val_batches, nb_epoch=3)
# get test images
test_data = utils.get_data(path+"test")
# run prediction on test images
[preds, idxs, classes] = model.predict(test_data)
# change the predicted 'classes' output to kaggle submission format
classes = [c.replace('dogs', '1') for c in classes]
classes = [c.replace('cats', '0') for c in classes]
# get the filenames from test folder
_, _, filenames = next(os.walk(path+"test/unknown"))
filenames = [f.replace('.jpg', '') for f in filenames]
# join the test image filenames and their predictions
output = np.column_stack((filenames, classes))