I’m trying to create my own image dataset to train a classifier as in Lesson 1.
The problem is that from the way I collected the images, they may be tricky to process:
- they come in a variety of formats (mainly .tif, .tiffs, .jpg, but also .eps)
- their height x width can be large. Running
verify_images
(https://docs.fast.ai/vision.data.html#verify_images on my folders), I get a bunch of messages likeImage size (89992333 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.
- some of them have 4 channels, some 1 (also found with
verify_images
).
What are the steps that you suggest to follow to deal with the issues above (for now, for simplicity, I am just disregarding those images [ - er, verify_images
did it, by automatically deleting them…] but they are a very important fraction of my dataset)?
Incidentally, I also thought that it’d be great if verify_images
gave a comprehensive report of the result of the analysis of the “image health”, like how many images are usable, a 2D scatter plot of height x width, a histogram of number of channels counts, etc. Also, the default behavior - deleting images that can’t be converted automatically - seems a tad too aggressive to me (shouldn’t be default).