Lesson 2 - Issue with getting ImageDataBunch to work

Hi,

I am trying to follow the steps from Lesson 2 and not able to get ImageDataBunch to work.
I followed the steps exactly from lesson 2 in exactly same manner but get below error. Basically, I don’t have train, valid and test folders created but I am providing valid_pct parameter. Can someone please help in pointing out what may be wrong? Thanks in advance:

ImageDataBunch Code:

np.random.seed(42)
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

Error:

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py:451: UserWarning: Your training set is empty. If this is by design, pass ignore_empty=True to remove this warning. warn(“Your training set is empty. If this is by design, pass ignore_empty=True to remove this warning.”) /usr/local/lib/python3.6/dist-packages/fastai/data_block.py:454: UserWarning: Your validation set is empty. If this is by design, use split_none() or pass ignore_empty=True when labelling to remove this warning. or pass ignore_empty=True when labelling to remove this warning.""")

---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

<ipython-input-14-f59111772c82> in <module>() 1 np.random.seed(42) ----> 2 data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats) 3

6 frames

/usr/local/lib/python3.6/dist-packages/fastai/core.py in index_row(a, idxs) 274 if isinstance(res,(pd.DataFrame,pd.Series)): return res.copy() 275 return res --> 276 return a[idxs] 277 278 def func_args(func)->bool:

IndexError: index 0 is out of bounds for axis 0 with size 0

Hi @kandkurte_ram, what does the output of path.ls() look like for you? I assume the problem is with the path and train combination you provided not working together. I think that passing train="." doesn’t do what you wanted it to do.

Thanks for response @dusan.
path.ls() gives below output:
[PosixPath(’/content/gdrive/My Drive/fastai/Data/bears/black’),
PosixPath(’/content/gdrive/My Drive/fastai/Data/bears/teddies’),
PosixPath(’/content/gdrive/My Drive/fastai/Data/bears/grizzly’)]

FYI, I have not created train/valid folders as these steps are not mentioned in the video. However I provided valid_pct=0.2 parameter to ImageDataBunch.

Please suggest.

Thanks

try omitting the train argument like this:

data = ImageDataBunch.from_folder(path, valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)*

Just to be sure, what version of fastai are you using?

And what is your setup? (OS, CPU or GPU)

Random thought: I remember getting errors unless I set num_workers=0 (I am on mac, only one CPU, no GPU). But maybe that’s completely unrelated.

Otherwise, I would try working through the data block API and constructing the ImageDataBunch step by step. That might show you exactly where the problem lies.

For example, suppose you have moved to the Data directory and set

from pathlib import Path
path = Path.cwd()

Looking at the source code for ImageDataBunch.from_folder(), start by creating an ImageList from folder:

from fastai.vision.data import ImageList
il = ImageList.from_folder(path)

Note that you can see visualise one of the images with

il[0]

Since you are specifying a validation percentage, I would think that fastai will not look into train and valid subfolders but generate training and validation sets according to your valid_pct. At any rate, a create the “source” (an ItemLists split using valid_pct) for your (eventual) data bunch:

valid_pct = 0.2
src = il.split_by_rand_pct(valid_pct)

The ItemLists (warning not to be confused with an ItemList) src has attributes train and valid:

type(src.train), len(src.train), type(src.valid), len(src.valid)

and can again visualise an image in each, say with

src.train[0]

Now these are not labeled, and their labels are decided from the folders in which they were taken:

src = src.label_from_folder()

Note that label_from_folder() is a method of the class ItemList while src is an ItemLists, but I don’t see how ItemLists inherits this method from ItemList since it is not a subclass thereof.

Create transformations for data augmentation and add to src:

from fastai.vision.transform import get_transforms
tfms = get_transforms()
src.transform(tfms, size=224);

Finally, create a DataBunch out of the labeled ItemLists src:

data = src.databunch()
1 Like

Hi @dusan,

I tried omitting “train” parameter but same result :(.

Thanks

Hi @Antoine.C,

Thank you for your detailed response.
Let me digest and try it.

Btw, I have a Mac but I am using Colab environment and using GPU.

Thanks.

By the way, I guess you are trying to reproduce “Creating your own dataset from Google images”. It might be useful to look at its history. I see something about Google Colab, but don’t know if that’s relevant to the issue you have been having.

Thanks to both @dusan and @Antoine.C.
I have resolved the issue which was related to Google Drive mount issue.
Now it is working fine.

1 Like

Google drive?

Just out of curiosity, and for future reference in case others have the same issue, could you give specifics on how you resolved it?

Hi @Antoine.C,

Yes, somehow my Google Drive was not properly mounted.
After following the steps mentioned on page https://course.fast.ai/start_colab.html and it worked like charm.

Thanks

HI @kandkurte_ram i am also facing the same issue

[PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/train.csv’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/test.csv’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/sample_submission.csv’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/test’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/train’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/cleaned.gsheet’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/cleaned.csv’)]

the input/train having my training images and input/test images having my test images

np.random.seed(42)
data = ImageDataBunch.from_folder(posixIP,train=".",valid_pct=2.0,ds_tfms=get_transforms(),size=224,num_workers=4,ignore_empty=True).normalize(imagenet_stats) this the command i executed
but facing error, kindly help me

Thanks, you helped me discover this exists;

src = src.label_from_re(pat=pat)