Corrupt images crashing Fast.ai - is there any way to quickly find and delete them?

xjdeng · January 6, 2019, 2:00am

In a perfect world, all of our images can be read correctly by the Fast.ai library. However, some of the images in my dataset are corrupt and causes Fast.ai to crash and give up when trying to load them into, say, a DataLoader.

Right now, I’m using open_image() and catching exceptions to find corrupt image files and deleting them before loading them for training but it’s turning out to be VERY slow.

Is there any faster way to do this? Is there any built-in function (esp in v1) that’ll detect/delete/skip corrupt images? If not, is there any way to read an image’s headers to quickly tell if it’s corrupt and delete the file so that when you get to loading it in Fast.ai, it won’t crash the pipeline?

calmdownkarm · January 6, 2019, 11:43am

There’s a verify_images function in fastai.vision I think - it was used in lesson 2 of the new version of the course. https://docs.fast.ai/vision.data.html#verify_images