[Solved Issue] ImageDataBunch.from_folder: No such file or directory

KevinB · October 28, 2018, 3:58am

I had an issue that took a while to solve so I wanted to document it in case anybody else has a similar issue.

First, the answer: You probably are missing a class that is in the train folder, but not in the valid folder.

Here is the code I was running:

PATH = Path("kaggleData/competitions/imaterialist-challenge-furniture-2018/")

data = ImageDataBunch.from_folder(PATH, train="train", valid="valid", test="test", size=112, bs=64)

Here is the full error I was seeing:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-6-fce2894c6463> in <module>()
      4 
      5 #data = ImageDataBunch.from_folder(PATH, train="train", ds_tfms=get_transforms(), size=112, bs=64, valid_pct=0.90)
----> 6 data = ImageDataBunch.from_folder(PATH, train="train", valid="valid", test="test", size=112, bs=64)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/fastai/vision/data.py in from_folder(cls, path, train, valid, test, valid_pct, **kwargs)
    276         if valid_pct is None:
    277             train_ds = ImageClassificationDataset.from_folder(path/train)
--> 278             datasets = [train_ds, ImageClassificationDataset.from_folder(path/valid, classes=train_ds.classes)]
    279         else: datasets = ImageClassificationDataset.from_folder(path/train, valid_pct=valid_pct)
    280 

~/anaconda3/envs/fastai/lib/python3.6/site-packages/fastai/vision/data.py in from_folder(cls, folder, classes, valid_pct, check_ext)
    102         fns,labels = [],[]
    103         for cl in classes:
--> 104             f,l = cls._folder_files(folder/cl, cl, check_ext=check_ext)
    105             fns+=f; labels+=l
    106 

~/anaconda3/envs/fastai/lib/python3.6/site-packages/fastai/vision/data.py in _folder_files(folder, label, check_ext)
     85     def _folder_files(folder:Path, label:ImgLabel, check_ext=True)->Tuple[FilePathList,ImgLabels]:
     86         "From `folder` return image files and labels. The labels are all `label`. `check_ext` means only image files."
---> 87         fnames = get_image_files(folder, check_ext=check_ext)
     88         return fnames,[label]*len(fnames)
     89 

~/anaconda3/envs/fastai/lib/python3.6/site-packages/fastai/vision/data.py in get_image_files(c, check_ext)
     18 def get_image_files(c:Path, check_ext:bool=True)->FilePathList:
     19     "Return list of files in `c` that are images. `check_ext` will filter to `image_extensions`."
---> 20     return [o for o in list(c.iterdir())
     21             if not o.name.startswith('.') and not o.is_dir()
     22             and (not check_ext or (o.suffix in image_extensions))]

~/anaconda3/envs/fastai/lib/python3.6/pathlib.py in iterdir(self)
   1077         if self._closed:
   1078             self._raise_closed()
-> 1079         for name in self._accessor.listdir(self):
   1080             if name in {'.', '..'}:
   1081                 # Yielding a path object for these makes little sense

~/anaconda3/envs/fastai/lib/python3.6/pathlib.py in wrapped(pathobj, *args)
    385         @functools.wraps(strfunc)
    386         def wrapped(pathobj, *args):
--> 387             return strfunc(str(pathobj), *args)
    388         return staticmethod(wrapped)
    389 

FileNotFoundError: [Errno 2] No such file or directory: 'kaggleData/competitions/imaterialist-challenge-furniture-2018/valid/models'

So I had thought maybe there was a bug where v1 was specifically looking for models, but then I remembered that I had been messing with my imageDataBunch.from_folder path and had accidentally added a folder to the train folder. So for me, I just needed to delete that folder that shouldn’t have been in there in the first place.

More commonly though, this would probably mean there is a class that is in your train folder that isn’t in the valid folder. Here is a quick way to check:

def class_checker(path, train_folder="train",valid_folder="valid"):
    notInTrain = []
    notInValid = []
    path = Path(path)
    train_check = (path/train_folder).ls()
    valid_check = (path/valid_folder).ls()
    for i in train_check:
        if i not in valid_check: notInValid.append(i)
    for i in valid_check:
        if i not in train_check: notInTrain.append(i)
    return notInTrain,notInValid

class_checker(PATH)

it outputs this to tell you which files aren’t in which folder:

([], ['1000'])

So in this case, no files are in valid that aren’t in train and there is 1 file called “1000” that is in train, but not present in valid.

Shubhajit · October 29, 2018, 3:38pm

@jeremy @sgugger @KevinB
I don’t have valid folder and I was using valid_pct = 0.9 inside ImageDataBunch.from_folder method, but it’s showing the exact error.
FileNotFoundError: [Errno 2] No such file or directory: 'dataset/valid/ALB'
My train dir has the /ALB subdir… Everything in my setup is in their latest versions.
What should I do!!

KevinB · October 29, 2018, 3:54pm

Run this and share what it shows for your output:

def class_checker(path, train_folder="train",valid_folder="valid"):
    notInTrain = []
    notInValid = []
    path = Path(path)
    train_check = (path/train_folder).ls()
    valid_check = (path/valid_folder).ls()
    for i in train_check:
        if i not in valid_check: notInValid.append(i)
    for i in valid_check:
        if i not in train_check: notInTrain.append(i)
    return notInTrain,notInValid

class_checker(PATH)

Also, can you share the rest of your error output? That might help determine with the issue is.

Also, avoid pulling extra people into the conversation using the @ unless they are the only people that can help with your question.

gsg · October 29, 2018, 4:09pm

One thing you can try is to create a valid directory yourself (under dataset) and move a few files (10%-20%) to that directory.
Some datasets may call a folder “test” corresponding to what Fastai calls “valid”.
If this is the case, you may just set valid=“test” in the ImageDataBunch.from_folder call.

Let us know if either works.

G

Shubhajit · October 29, 2018, 4:09pm

I was using the Kaggle competition : The Nature Conservancy Fisheries Monitoring
data = ImageDataBunch.from_folder(PATH, train=“train”, test=“test”, ds_tfms = get_transforms, size=224, bs=bs, valid_pct=0.9)

Got the below error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-22-19df3e44e39a> in <module>
----> 1 data = ImageDataBunch.from_folder(PATH, train="train", test="test", ds_tfms = get_transforms, size=224, bs=bs, valid_pct=0.9)

/opt/anaconda3/lib/python3.6/site-packages/fastai/vision/data.py in from_folder(cls, path, train, valid, test, **kwargs)
    278         "Create from imagenet style dataset in `path` with `train`,`valid`,`test` subfolders (or provide `valid_pct`)."
    279         path=Path(path)
--> 280         if valid_pct is None:
    281             train_ds = ImageClassificationDataset.from_folder(path/train)
    282             datasets = [train_ds, ImageClassificationDataset.from_folder(path/valid, classes=train_ds.classes)]

/opt/anaconda3/lib/python3.6/site-packages/fastai/vision/data.py in from_folder(cls, folder, classes, valid_pct, check_ext)
    102         fns,labels = [],[]
    103         for cl in classes:
--> 104             f,l = cls._folder_files(folder/cl, cl, check_ext=check_ext)
    105             fns+=f; labels+=l
    106 

/opt/anaconda3/lib/python3.6/site-packages/fastai/vision/data.py in _folder_files(folder, label, check_ext)
     85     def _folder_files(folder:Path, label:ImgLabel, check_ext=True)->Tuple[FilePathList,ImgLabels]:
     86         "From `folder` return image files and labels. The labels are all `label`. `check_ext` means only image files."
---> 87         fnames = get_image_files(folder, check_ext=check_ext)
     88         return fnames,[label]*len(fnames)
     89 

/opt/anaconda3/lib/python3.6/site-packages/fastai/vision/data.py in get_image_files(c, check_ext)
     18     return [o for o in Path(c).glob('**/*' if recurse else '*')
     19             if not o.name.startswith('.') and not o.is_dir()
---> 20             and (not check_ext or (o.suffix in image_extensions))]
     21 
     22 def get_annotations(fname, prefix=None):

/opt/anaconda3/lib/python3.6/pathlib.py in iterdir(self)
   1077         if self._closed:
   1078             self._raise_closed()
-> 1079         for name in self._accessor.listdir(self):
   1080             if name in {'.', '..'}:
   1081                 # Yielding a path object for these makes little sense

/opt/anaconda3/lib/python3.6/pathlib.py in wrapped(pathobj, *args)
    385         @functools.wraps(strfunc)
    386         def wrapped(pathobj, *args):
--> 387             return strfunc(str(pathobj), *args)
    388         return staticmethod(wrapped)
    389 

FileNotFoundError: [Errno 2] No such file or directory: 'dataset/valid/ALB'

I have the following directory structure:

dataset/train/
dataset/test/

Shubhajit · October 29, 2018, 4:12pm

Run this function & got the obvious error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-19-261d2609991f> in <module>
     11     return notInTrain,notInValid
     12 
---> 13 class_checker(PATH)

<ipython-input-19-261d2609991f> in class_checker(path, train_folder, valid_folder)
      4     path = Path(path)
      5     train_check = (path/train_folder).ls()
----> 6     valid_check = (path/valid_folder).ls()
      7     for i in train_check:
      8         if i not in valid_check: notInValid.append(i)

/opt/anaconda3/lib/python3.6/site-packages/fastai/core.py in <lambda>(x)
    171             f.write(buffer)
    172 
--> 173 def range_of(x): return list(range(len(x)))
    174 def arange_of(x): return np.arange(len(x))
    175 

/opt/anaconda3/lib/python3.6/site-packages/fastai/core.py in <listcomp>(.0)
    171             f.write(buffer)
    172 
--> 173 def range_of(x): return list(range(len(x)))
    174 def arange_of(x): return np.arange(len(x))
    175 

/opt/anaconda3/lib/python3.6/pathlib.py in iterdir(self)
   1077         if self._closed:
   1078             self._raise_closed()
-> 1079         for name in self._accessor.listdir(self):
   1080             if name in {'.', '..'}:
   1081                 # Yielding a path object for these makes little sense

/opt/anaconda3/lib/python3.6/pathlib.py in wrapped(pathobj, *args)
    385         @functools.wraps(strfunc)
    386         def wrapped(pathobj, *args):
--> 387             return strfunc(str(pathobj), *args)
    388         return staticmethod(wrapped)
    389 

FileNotFoundError: [Errno 2] No such file or directory: 'dataset/valid'

Shubhajit · October 29, 2018, 4:14pm

Yeah, doing this and this will definitely gonna work.

gsg · October 29, 2018, 4:25pm

Good.
Maybe there was some change in the semantics of .from_folder where now it expects these folders to be pre-loaded, rather than creating the validation set automatically… not sure if this is the explanation…

Shubhajit · October 29, 2018, 5:32pm

I have moved 5% of my train data (with the same directory structure) to valid/, and when I run data = ImageDataBunch.from_folder(PATH, train="train", valid='valid', test="test", ds_tfms = get_transforms, size=224, bs=bs), got the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-8af90b08e7d5> in <module>
----> 1 data = ImageDataBunch.from_folder(PATH, train="train", valid='valid', test="test", ds_tfms = get_transforms, size=224, bs=bs)

/opt/anaconda3/lib/python3.6/site-packages/fastai/vision/data.py in from_folder(cls, path, train, valid, test, valid_pct, **kwargs)
    285         if test: datasets.append(ImageClassificationDataset.from_single_folder(
    286             path/test,classes=train_ds.classes))
--> 287         return cls.create(*datasets, path=path, **kwargs)
    288 
    289 

/opt/anaconda3/lib/python3.6/site-packages/fastai/vision/data.py in create(cls, train_ds, valid_ds, test_ds, path, bs, ds_tfms, num_workers, tfms, device, collate_fn, size, **kwargs)
    268         datasets = [train_ds,valid_ds]
    269         if test_ds is not None: datasets.append(test_ds)
--> 270         if ds_tfms: datasets = transform_datasets(*datasets, tfms=ds_tfms, size=size, **kwargs)
    271         dls = [DataLoader(*o, num_workers=num_workers) for o in
    272                zip(datasets, (bs,bs*2,bs*2), (True,False,False))]

/opt/anaconda3/lib/python3.6/site-packages/fastai/vision/data.py in transform_datasets(train_ds, valid_ds, test_ds, tfms, **kwargs)
    211                        tfms:Optional[Tuple[TfmList,TfmList]]=None, **kwargs:Any):
    212     "Create train, valid and maybe test DatasetTfm` using `tfms` = (train_tfms,valid_tfms)."
--> 213     res = [DatasetTfm(train_ds, tfms[0],  **kwargs),
    214            DatasetTfm(valid_ds, tfms[1],  **kwargs)]
    215     if test_ds is not None: res.append(DatasetTfm(test_ds, tfms[1],  **kwargs))

TypeError: 'function' object is not subscriptable

KevinB · October 29, 2018, 5:34pm

You need parentheses on get_transforms I think.

ImageDataBunch.from_folder(PATH, train="train", valid='valid', test="test", ds_tfms = get_transforms(), size=224, bs=bs)

Shubhajit · October 29, 2018, 5:40pm

Oops, my mistake.
Thanks Kevin

KevinB · October 29, 2018, 5:44pm

Try fixing that on the other command too. That might be where the issue was in the beginning.

Shubhajit · October 29, 2018, 6:00pm

Sure.
One more thing, what must be a good ratio of train and valid in such Image dataset?

KevinB · October 29, 2018, 6:23pm

.1 or .2 is what I have usually seen.

Shubhajit · October 29, 2018, 6:26pm

Do you mean in 0.1% or 0.2%?

KevinB · October 29, 2018, 6:42pm

No, sorry 10% or 20%

Shubhajit · October 29, 2018, 6:54pm

Then valid_pct=.1 would make 10% validation data?

KevinB · October 29, 2018, 7:00pm

Correct.

Shubhajit · October 29, 2018, 7:04pm

When I run this,
data = ImageDataBunch.from_folder(PATH, train="train", test="test",ds_tfms = get_transforms(), size=320, bs=bs, valid_pct=0.9)

got the following error:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-55-6c5e6dad52ca> in <module>
----> 1 data = ImageDataBunch.from_folder(PATH, train="train", test="test",ds_tfms = get_transforms(), size=320, bs=bs, valid_pct=0.9)
      2 data.normalize(imagenet_stats)

/opt/anaconda3/lib/python3.6/site-packages/fastai/vision/data.py in from_folder(cls, path, train, valid, test, valid_pct, **kwargs)
    284 
    285         if test: datasets.append(ImageClassificationDataset.from_single_folder(
--> 286             path/test,classes=train_ds.classes))
    287         return cls.create(*datasets, path=path, **kwargs)
    288 

UnboundLocalError: local variable 'train_ds' referenced before assignment

KevinB · October 29, 2018, 7:11pm

That is a bug that is fixed in the most recent version, but it might not be available on the conda version yet.

Here is a reference to it: Bug when using ImageDataBunch.from_folder and valid_pct with test