Bad results on Kaggle leaderboard with high validation accuracy

I’m around lesson 5 now and following Jeremy’s advice of trying to reach the top 50% in the state farm competition.
I used the VGG16 model with batch normalization and trained it to a validation accuracy of 0.96 (approx, I don’t remember exactly)
I thought this was pretty good so I submitted the results to kaggle, but I got a score of 2.0 even after applying clipping
This is among the worst 10% percentile, so a pretty bad position
I’ve been checking everything this morning but can’t see what’s wrong. Could it be that I am getting a good accuracy with the training/validation set, but bad accuracy with the test set?
How would you test for this kind of thing?

PS: I split the train data in 4500 validation images and around 16000 training images


I was checking my results in git and this was the last line of the training

“17944/17944 [==============================] - 517s - loss: 2.1110 - acc: 0.8299 - val_loss: 0.4414 - val_acc: 0.9638\n”,

What is interesting to me is that the loss is actually 2.111

Since I am calculating the loss using categorical cross entropy, it should be comparable to the Kaggle competition’s score, and it is actually close. Does this make sense?