Lesson 1: OSError: cannot identify image file PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/saint_bernard_188.jpg')

I’m using Salamander.
Attempting to run:

np.random.seed(2)
pat = r'/([^/]+)_\d+.jpg$

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=24, bs=bs).normalize(imagenet_stats)

and i get this error:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-126-ecaa07fc186b> in <module>
----> 1 data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=24, bs=bs).normalize(imagenet_stats)
      2 

~/anaconda3/lib/python3.7/site-packages/fastai/vision/data.py in from_name_re(cls, path, fnames, pat, valid_pct, **kwargs)
    156             assert res,f'Failed to find "{pat}" in "{fn}"'
    157             return res.group(1)
--> 158         return cls.from_name_func(path, fnames, _get_label, valid_pct=valid_pct, **kwargs)
    159 
    160     @staticmethod

~/anaconda3/lib/python3.7/site-packages/fastai/vision/data.py in from_name_func(cls, path, fnames, label_func, valid_pct, seed, **kwargs)
    145         "Create from list of `fnames` in `path` with `label_func`."
    146         src = ImageList(fnames, path=path).split_by_rand_pct(valid_pct, seed)
--> 147         return cls.create_from_ll(src.label_from_func(label_func), **kwargs)
    148 
    149     @classmethod

~/anaconda3/lib/python3.7/site-packages/fastai/vision/data.py in create_from_ll(cls, lls, bs, val_bs, ds_tfms, num_workers, dl_tfms, device, test, collate_fn, size, no_check, resize_method, mult, padding_mode, mode, tfm_y)
     95         "Create an `ImageDataBunch` from `LabelLists` `lls` with potential `ds_tfms`."
     96         lls = lls.transform(tfms=ds_tfms, size=size, resize_method=resize_method, mult=mult, padding_mode=padding_mode,
---> 97                             mode=mode, tfm_y=tfm_y)
     98         if test is not None: lls.add_test_folder(test)
     99         return lls.databunch(bs=bs, val_bs=val_bs, dl_tfms=dl_tfms, num_workers=num_workers, collate_fn=collate_fn,

~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in transform(self, tfms, **kwargs)
    500         if not tfms: tfms=(None,None)
    501         assert is_listy(tfms) and len(tfms) == 2, "Please pass a list of two lists of transforms (train and valid)."
--> 502         self.train.transform(tfms[0], **kwargs)
    503         self.valid.transform(tfms[1], **kwargs)
    504         if self.test: self.test.transform(tfms[1], **kwargs)

~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in transform(self, tfms, tfm_y, **kwargs)
    719     def transform(self, tfms:TfmList, tfm_y:bool=None, **kwargs):
    720         "Set the `tfms` and `tfm_y` value to be applied to the inputs and targets."
--> 721         _check_kwargs(self.x, tfms, **kwargs)
    722         if tfm_y is None: tfm_y = self.tfm_y
    723         tfms_y = None if tfms is None else list(filter(lambda t: getattr(t, 'use_on_y', True), listify(tfms)))

~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in _check_kwargs(ds, tfms, **kwargs)
    588     if (tfms is None or len(tfms) == 0) and len(kwargs) == 0: return
    589     if len(ds.items) >= 1:
--> 590         x = ds[0]
    591         try: x.apply_tfms(tfms, **kwargs)
    592         except Exception as e:

~/anaconda3/lib/python3.7/site-packages/fastai/data_block.py in __getitem__(self, idxs)
    116         "returns a single item based if `idxs` is an integer or a new `ItemList` object if `idxs` is a range."
    117         idxs = try_int(idxs)
--> 118         if isinstance(idxs, Integral): return self.get(idxs)
    119         else: return self.new(self.items[idxs], inner_df=index_row(self.inner_df, idxs))
    120 

~/anaconda3/lib/python3.7/site-packages/fastai/vision/data.py in get(self, i)
    269     def get(self, i):
    270         fn = super().get(i)
--> 271         res = self.open(fn)
    272         self.sizes[i] = res.size
    273         return res

~/anaconda3/lib/python3.7/site-packages/fastai/vision/data.py in open(self, fn)
    265     def open(self, fn):
    266         "Open image in `fn`, subclass and overwrite for custom behavior."
--> 267         return open_image(fn, convert_mode=self.convert_mode, after_open=self.after_open)
    268 
    269     def get(self, i):

~/anaconda3/lib/python3.7/site-packages/fastai/vision/image.py in open_image(fn, div, convert_mode, cls, after_open)
    391     with warnings.catch_warnings():
    392         warnings.simplefilter("ignore", UserWarning) # EXIF warning from TiffPlugin
--> 393         x = PIL.Image.open(fn).convert(convert_mode)
    394     if after_open: x = after_open(x)
    395     x = pil2tensor(x,np.float32)

~/anaconda3/lib/python3.7/site-packages/PIL/Image.py in open(fp, mode)
   2703         warnings.warn(message)
   2704     raise IOError("cannot identify image file %r"
-> 2705                   % (filename if filename else fp))
   2706 
   2707 #

OSError: cannot identify image file PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images/saint_bernard_188.jpg')

anyone know why? I’m guessing its the issue with not being able to identify PosixPath but when i run this cell:
path.ls()

i get an output of:

[PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/images'),
 PosixPath('/home/ubuntu/.fastai/data/oxford-iiit-pet/annotations')]

so i’m confused why i’m getting the issue with not being able to identify the image file PosixPath

Hi @leugene,

First, being able to run the python instruction path.ls() (which list the files and subfolders of first level located at path), does not mean that files in subfolders can be read.

Since the PIL.Image.open is throwing an error, most likely, the file is present but unreadable, meaning that you might have encountered a connection issue during dataset download or something similar that ended up corrupting this specific file. The corruption can also be caused by issues on the dataset server side but it’s hard to tell at this moment.

So I would recommend:

  1. check if you can open manually the file at /home/ubuntu/.fastai/data/oxford-iiit-pet/images/saint_bernard_188.jpg. If not, then it’s indeed corrupted.
  2. if corrupted, you can try to delete the entire folder (it would be great to only download this specific file again, but I’m not that knowledgeable about dataset downloading specifics with fastai :sweat_smile:) by typing the following in your console and pressing Enter (given the path, I assume you’re on Ubuntu):
sudo rm -r /home/ubuntu/.fastai/data/oxford-iiit-pet

more specifically on Unix systems, sudo will ask your password to grant you admin rights in case it’s necessary, rmshort for remove is a command for file removal, and -r flag means that it will recursively apply the command to subfolders and subfiles located at the specified path
3. Then rerun the notebook to download the dataset again.

If the file is not corrupted, check if reinstalling pil and fastai helps (using the same install method for both conda or pip).

I don’t know if you are proficient with code reading but generally speaking, when you encounter this kind of issue, you can:

  • use the traceback (the error message popping when python command fail) to identify was is causing the error. Here it’s data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=24, bs=bs).normalize(imagenet_stats) as you identified it.
  • But checking the traceback, you can go further and see that the issue is actually on file opening as per
--> 393         x = PIL.Image.open(fn).convert(convert_mode)

In this case, it’s pointing out, that opening at least this specific file is not possible!

Let me know if this isn’t clear, cheers!

1 Like

extremely helpful! Thank you!

We experienced the same error using our own dataset of flowers (daisies, tulips, sunflowers, dandelions, roses). The error “cannot identify image file” appeared when a particular image file was accessed. By simply removing the offending image files one by one and rerunning Lesson 1, we achieved error-free execution (we only had to remove 3 images files out of a dataset containing 4,326 image files). We have yet to investigate what possibly could be wrong with the 3 image files.

Yes, I had the same error but with a different potentially corrupted file (leonberger_195.jpg). I wasn’t quite sure how to check if the file was corrupted by inspecting it in the terminal.

Deleting the file and re-running the notebook from the “fnames = get_image_files(path_img)” step fixed it.