Potential bug in ImageDataBunch.from_folder with Capitalized extension

I don’t know if it’s a bug or not but it’s interesting that ImageDataBunch.from_folder function cannot read images if the name of the image has a capitalized extension like “.JPG” instead of “.jpg”. I checked the root cause behind this problem and it seems like when ImageDataBunch.from_folder calls ImageClassificationDataset._folder_files function and that calls get_image_files (present in vision/data.py) which checks for a valid extension from image_extensions list (mimetypes list). This image_extensions list only have lower_case extensions mentioned which could be the potential problem.

So instead of (o.suffix in image_extensions)) check maybe the check should be (o.suffix.lower() in image_extensions)) because in my opinion it doesn’t matter if the extension is capitalized or not.

5 Likes

Good idea :slight_smile:

2 Likes

How i will deal with this problem.I had a dataset and images have different extension ,like .jpg .JPG .jpg . What should i do ?

This bug is already resolved in new version of fastai library.

i tried with mixed extension but it gave error

Unix is case-sensitive for filenames whereas windows isn’t.

This is an area where you really should be cleaning up the data first as even if fastai works round the problem you can be sure it will come back and bite you with something else.

A quick google throws up: https://stackoverflow.com/questions/11818408/convert-all-file-extensions-to-lower-case