[Solved Issue] ImageDataBunch.from_folder: No such file or directory

Do you mean in 0.1% or 0.2%?

No, sorry 10% or 20%

Then valid_pct=.1 would make 10% validation data?

Correct.

When I run this,
data = ImageDataBunch.from_folder(PATH, train="train", test="test",ds_tfms = get_transforms(), size=320, bs=bs, valid_pct=0.9)

got the following error:

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-55-6c5e6dad52ca> in <module>
----> 1 data = ImageDataBunch.from_folder(PATH, train="train", test="test",ds_tfms = get_transforms(), size=320, bs=bs, valid_pct=0.9)
      2 data.normalize(imagenet_stats)

/opt/anaconda3/lib/python3.6/site-packages/fastai/vision/data.py in from_folder(cls, path, train, valid, test, valid_pct, **kwargs)
    284 
    285         if test: datasets.append(ImageClassificationDataset.from_single_folder(
--> 286             path/test,classes=train_ds.classes))
    287         return cls.create(*datasets, path=path, **kwargs)
    288 

UnboundLocalError: local variable 'train_ds' referenced before assignment

That is a bug that is fixed in the most recent version, but it might not be available on the conda version yet.

Here is a reference to it: Bug when using ImageDataBunch.from_folder and valid_pct with test

I have the latest release of fastai viz. 1.0.15 !

I’m having a similar issue. I have a folder tree like this:

data
    class1
    class2
    class3

I called data = ImageDataBunch.from_folder(path, valid_pct=0.3, ds_tfms=get_transforms(), size=224)

I get FileNotFoundError: [Errno 2] No such file or directory: 'compositions/train'

My reading of the vision docs suggested that this would be ok. I thought valid_pct would recursively split all the classes into train/valid folders. But I get the same error No such file or directory:.

At minimum we might consider making it a bit clearer in the docs if several people are making this same error.

It doesn’t actually move things in to folders. So you need to tell it where the images are. By default they are in ‘train’, which yours aren’t. So you’ll see in my sample notebook I have:

data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
    ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

I do agree this is unclear/unexpected. Perhaps one option would be for ‘train’ to default to ‘.’ if ‘valid_pct’ is not zero?

2 Likes

Or maybe just catch that error and spell it out a bit? Like if it would give you the FileNotFoundError, it instead tells you that by default, the images should be in path/train. If you want to modify that location, change the train parameter to train={desired location of files} Not sure if that would help people or would confuse people.

1 Like

This would have helped me!

I’m probably just going to script a split from:

data
    class1
    class2

to

data
    train
        class1
        class2
    valid
        class1
        class2

You should still be able to use it if your data is how you showed above, is your path variable pointing to data?

1 Like

It is pointing to the parent directory of my big list of dirs (classes)

Yeah, try adding train="." That tells it to use the current folder.

1 Like

It still seems to be looking for the valid/train split in the current dir.

No such file or directory: 'compositions/valid/albéniz'

Path points to compositions and albéniz is a child of compositions

so do you have compositions/data/{class1, class2, etc}?

1 Like

Nope it looks just like this:

compositions
    albeniz
        img1.png
        img2.png
    bach
        img1.png
        img2.png
    etc..

I already tried cd-ing into compositions and running it from there, as well as cd-ing in and setting path to ‘.’

It consistently looks for valid/folder_name_here, which obviously doesn’t exist. This seems to square with what Jeremy shared above.

I would love for this to work, but perhaps it’s just not designed that way?

Can you post your current code? I am curious to try recreating the issue

Mostly interested in the ImageDataBunch.from_folder command

1 Like

Yeah!

path = Path('compositions')
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.3, ds_tfms=get_transforms(), size=224)

That’s it!