Do you mean in 0.1% or 0.2%?
No, sorry 10% or 20%
Then valid_pct=.1
would make 10% validation data?
Correct.
When I run this,
data = ImageDataBunch.from_folder(PATH, train="train", test="test",ds_tfms = get_transforms(), size=320, bs=bs, valid_pct=0.9)
got the following error:
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
<ipython-input-55-6c5e6dad52ca> in <module>
----> 1 data = ImageDataBunch.from_folder(PATH, train="train", test="test",ds_tfms = get_transforms(), size=320, bs=bs, valid_pct=0.9)
2 data.normalize(imagenet_stats)
/opt/anaconda3/lib/python3.6/site-packages/fastai/vision/data.py in from_folder(cls, path, train, valid, test, valid_pct, **kwargs)
284
285 if test: datasets.append(ImageClassificationDataset.from_single_folder(
--> 286 path/test,classes=train_ds.classes))
287 return cls.create(*datasets, path=path, **kwargs)
288
UnboundLocalError: local variable 'train_ds' referenced before assignment
That is a bug that is fixed in the most recent version, but it might not be available on the conda version yet.
Here is a reference to it: Bug when using ImageDataBunch.from_folder and valid_pct with test
I have the latest release of fastai viz. 1.0.15 !
I’m having a similar issue. I have a folder tree like this:
data
class1
class2
class3
I called data = ImageDataBunch.from_folder(path, valid_pct=0.3, ds_tfms=get_transforms(), size=224)
I get FileNotFoundError: [Errno 2] No such file or directory: 'compositions/train'
My reading of the vision docs suggested that this would be ok. I thought valid_pct would recursively split all the classes into train/valid folders. But I get the same error No such file or directory:
.
At minimum we might consider making it a bit clearer in the docs if several people are making this same error.
It doesn’t actually move things in to folders. So you need to tell it where the images are. By default they are in ‘train’, which yours aren’t. So you’ll see in my sample notebook I have:
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)
I do agree this is unclear/unexpected. Perhaps one option would be for ‘train’ to default to ‘.’ if ‘valid_pct’ is not zero?
Or maybe just catch that error and spell it out a bit? Like if it would give you the FileNotFoundError, it instead tells you that by default, the images should be in path/train
. If you want to modify that location, change the train parameter to train={desired location of files}
Not sure if that would help people or would confuse people.
This would have helped me!
I’m probably just going to script a split from:
data
class1
class2
to
data
train
class1
class2
valid
class1
class2
You should still be able to use it if your data is how you showed above, is your path variable pointing to data?
It is pointing to the parent directory of my big list of dirs (classes)
Yeah, try adding train="."
That tells it to use the current folder.
It still seems to be looking for the valid/train split in the current dir.
No such file or directory: 'compositions/valid/albéniz'
Path points to compositions
and albéniz is a child of compositions
so do you have compositions/data/{class1, class2, etc}?
Nope it looks just like this:
compositions
albeniz
img1.png
img2.png
bach
img1.png
img2.png
etc..
I already tried cd-ing into compositions and running it from there, as well as cd-ing in and setting path to ‘.’
It consistently looks for valid/folder_name_here, which obviously doesn’t exist. This seems to square with what Jeremy shared above.
I would love for this to work, but perhaps it’s just not designed that way?
Can you post your current code? I am curious to try recreating the issue
Mostly interested in the ImageDataBunch.from_folder command
Yeah!
path = Path('compositions')
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.3, ds_tfms=get_transforms(), size=224)
That’s it!