In a perfect world, all of our images can be read correctly by the Fast.ai library. However, some of the images in my dataset are corrupt and causes Fast.ai to crash and give up when trying to load them into, say, a DataLoader.
Right now, I’m using
open_image() and catching exceptions to find corrupt image files and deleting them before loading them for training but it’s turning out to be VERY slow.
Is there any faster way to do this? Is there any built-in function (esp in v1) that’ll detect/delete/skip corrupt images? If not, is there any way to read an image’s headers to quickly tell if it’s corrupt and delete the file so that when you get to loading it in Fast.ai, it won’t crash the pipeline?