Validation accuracy vs. test accuracy

jaredsk · July 8, 2017, 3:54am

I’m working with a non-Kaggle dataset, which I have randomly separated into train, valid, and test sets (60%, 20%, 20% respectively).

When I train models on this data with Keras, I’m able to get 94% val_acc / ~0.15 val_loss. But when I use model.predict() to generate predictions on my test set, I get about 13% of the predictions wrong.

I’ve checked and I can’t see any difference between the valid and test sets, or that there’s any overlap with the training data.

Am I misunderstanding the nature/use of validation accuracy, or must there be some mistake still hiding in my preprocessing?

MLNewbie · July 8, 2017, 4:08pm

Hi @jaredsk - 87% accuracy on test data set seems to be on the positive side. If you still need to increase the accuracy, you may probably need to train the model, with more data. It will be great, if you can share some code snippet or what kind of data volume you have, to comment more on the same.