I have a dataset with roughly 30000 images and a model solving a binary classification problem with roughly 90% accuracy on those images. I am running the ImageCleaner on the output of DatasetFormatter().from_toplosses. The two labels are 0 and 1. There are a fair number of ground truth 0s and 1s that the model misclassifies. From my understanding of cross entropy loss and under the assumption that the threshold for classification is .5, I would expect all incorrectly classified images to appear first in the list of toplosses. However, the first 5000 images all have ground truth label 1. Since there are only 3000 wrong classifications, some of the correctly classified ground truth 1s are occuring before some of the incorrectly classified ground truth 0s.
Can someone help me figure out what’s going on here?