94% accuracy in train/val, 15% in test? What am I doing wrong?

Hi guys, beginner here, I don’t understand what I’m doing wrong. I’m getting excellent results in training and validation but not for my test set. I applied the same transformations on the test set as I did on my training images.

data = ImageDataBunch.from_folder('./tmp',ds_tfms = tfms, valid='val', test='test', size=sz,bs=bs)
log_preds,y = learn.TTA(ds_type=DatasetType.Test)
probs = np.argmax(log_preds, axis=1)

I’m testing the accuracy by using:
accuracy(log_preds, y)

If your test data are in separate folders for each class then you shouldn’t need a csv file.
But if you want, you can read it from csv file into ‘y’ and check your accuracy again.

My mistake, it has nothing to do with the test csv. I still don’t know what’s wrong

How many classes do you have?

I have 4 classes.

But I was actually wondering how does accuracy(log_preds, y) works?

I do understand how the algo predicts log_preds but where does it get the y from?

Is your validation set sufficiently different from your test set? I.e., are you sure that there is insufficient overlap between your validation and test sets?

If your validation set resembles the test set too closely, you will not be able to determine if you are overfitting. Do try and keep it as separate from the training set as possible. For example – if using photos of humans, make sure that the train and valid set have different humans. Similarly, if using photos of satellite images, do ensure that there isn’t a geographical overlap of lat-long coordinates between your training and validation sets.

accuracy function uses the model output (log_preds) and finds the argmax (the index with highest value) which would be models prediction. If the prediction has the same value as y, it means the model made the right prediction. Accuracy is just how often the model correctly predicted the class.
The accuracy you are getting is very odd because even if the model makes a random guess it should have accuracy of around 25%.
One thing that you can check is the number of instances of each class in your training, validation and test set. If you have one class appearing much more than the others, then might need to reconsider how you use your data.