I found that any unsupervised downloads from icrawler, google image search or other downloaders contain a lot of wrong and even corrupted images and I feared they’d mess up my classification…
So I always check them visually and mark any unsuitable ones for deletion…
I wrote a small python package for this task (auto-download based on csv file of search queries/ terms) and quickly clean via GUI script afterwards…
I linked it elsewhere in the forums:
I still takes a lot of time to weed through many-class-datasets, but you get some clean data (11 classes, approx. 8500 images downloaded and cleft after cleaning with the script).