I’m working with a non-Kaggle dataset, which I have randomly separated into train, valid, and test sets (60%, 20%, 20% respectively).
When I train models on this data with Keras, I’m able to get 94% val_acc / ~0.15 val_loss. But when I use model.predict() to generate predictions on my test set, I get about 13% of the predictions wrong.
I’ve checked and I can’t see any difference between the valid and test sets, or that there’s any overlap with the training data.
Am I misunderstanding the nature/use of validation accuracy, or must there be some mistake still hiding in my preprocessing?