Lesson 1 The Oxford-IIIT PET Dataset - Mislabeled images

I did some searching but wasn’t able to find this mentioned anywhere - some of the images in the Oxford-IIIT PET Dataset are mislabeled.

I was walking through the lesson 1 notebook and thought I had an issue with my labeling after the plot top loss step. It turned out that the labeling was fine, and the model had found several mislabeled images in the dataset…Cool!

Check it out

Confirmed the boxer images are indeed in the saint bernard set.

I thought this might be worth posting in case anyone else runs into the same thing. I spent a fair amount of time trying to figure out what was going on before finding the dataset source and how to get to the top loss file paths.

5 Likes

Totally agree with your finding. I’ve found (at least) two additional wrongly classified saint-bernards:

Wrongly classified dogs

Update: I’ve just read the license section on the website: it’s CC-BY-SA International, so maybe there are (improved) forks available somewhere out there already. From their website:

License

The dataset is available to download for commercial/research purposes under a Creative Commons Attribution-ShareAlike 4.0 International License. The copyright remains with the original owners of the images.

3 Likes

Yep, the whole dataset is messed up!

Any improved datasets out there that you know about?