You’re correct, the ImageCleaner widget that @lesscomfortable created can help but it requires manually scanning through images (a bunch at a time) and deleting the bad ones. (more here: Duplicate Widget)
There is also the verify_images function (doc, src) that can verify the image path, if the image file is readable or not and if it has specified no. of channels etc. It can delete an image that fails these tests, but perhaps you can modify it to simply print the filenames.
Hope this helps!