Lesson 2 - Issue with getting ImageDataBunch to work

kandkurte_ram · July 22, 2019, 10:17am

Hi,

I am trying to follow the steps from Lesson 2 and not able to get ImageDataBunch to work.
I followed the steps exactly from lesson 2 in exactly same manner but get below error. Basically, I don’t have train, valid and test folders created but I am providing valid_pct parameter. Can someone please help in pointing out what may be wrong? Thanks in advance:

ImageDataBunch Code:

np.random.seed(42)
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

Error:

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py:451: UserWarning: Your training set is empty. If this is by design, pass ignore_empty=True to remove this warning. warn(“Your training set is empty. If this is by design, pass ignore_empty=True to remove this warning.”) /usr/local/lib/python3.6/dist-packages/fastai/data_block.py:454: UserWarning: Your validation set is empty. If this is by design, use split_none() or pass ignore_empty=True when labelling to remove this warning. or pass ignore_empty=True when labelling to remove this warning.""")

---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

<ipython-input-14-f59111772c82> in <module>() 1 np.random.seed(42) ----> 2 data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats) 3

6 frames

/usr/local/lib/python3.6/dist-packages/fastai/core.py in index_row(a, idxs) 274 if isinstance(res,(pd.DataFrame,pd.Series)): return res.copy() 275 return res --> 276 return a[idxs] 277 278 def func_args(func)->bool:

IndexError: index 0 is out of bounds for axis 0 with size 0

dusan · July 22, 2019, 11:05am

Hi @kandkurte_ram, what does the output of path.ls() look like for you? I assume the problem is with the path and train combination you provided not working together. I think that passing train="." doesn’t do what you wanted it to do.

kandkurte_ram · July 23, 2019, 10:02am

Thanks for response @dusan.
path.ls() gives below output:
[PosixPath(’/content/gdrive/My Drive/fastai/Data/bears/black’),
PosixPath(’/content/gdrive/My Drive/fastai/Data/bears/teddies’),
PosixPath(’/content/gdrive/My Drive/fastai/Data/bears/grizzly’)]

FYI, I have not created train/valid folders as these steps are not mentioned in the video. However I provided valid_pct=0.2 parameter to ImageDataBunch.

Please suggest.

Thanks

dusan · July 23, 2019, 10:40am

try omitting the train argument like this:

data = ImageDataBunch.from_folder(path, valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)*

Antoine.C · July 23, 2019, 11:03am

Just to be sure, what version of fastai are you using?

And what is your setup? (OS, CPU or GPU)

Random thought: I remember getting errors unless I set num_workers=0 (I am on mac, only one CPU, no GPU). But maybe that’s completely unrelated.

Otherwise, I would try working through the data block API and constructing the ImageDataBunch step by step. That might show you exactly where the problem lies.

For example, suppose you have moved to the Data directory and set

from pathlib import Path
path = Path.cwd()

Looking at the source code for ImageDataBunch.from_folder(), start by creating an ImageList from folder:

from fastai.vision.data import ImageList
il = ImageList.from_folder(path)

Note that you can see visualise one of the images with

il[0]

Since you are specifying a validation percentage, I would think that fastai will not look into train and valid subfolders but generate training and validation sets according to your valid_pct. At any rate, a create the “source” (an ItemLists split using valid_pct) for your (eventual) data bunch:

valid_pct = 0.2
src = il.split_by_rand_pct(valid_pct)

The ItemLists (warning not to be confused with an ItemList) src has attributes train and valid:

type(src.train), len(src.train), type(src.valid), len(src.valid)

and can again visualise an image in each, say with

src.train[0]

Now these are not labeled, and their labels are decided from the folders in which they were taken:

src = src.label_from_folder()

Note that label_from_folder() is a method of the class ItemList while src is an ItemLists, but I don’t see how ItemLists inherits this method from ItemList since it is not a subclass thereof.

Create transformations for data augmentation and add to src:

from fastai.vision.transform import get_transforms
tfms = get_transforms()
src.transform(tfms, size=224);

Finally, create a DataBunch out of the labeled ItemLists src:

data = src.databunch()

kandkurte_ram · July 24, 2019, 12:30am

Hi @dusan,

I tried omitting “train” parameter but same result :(.

Thanks

kandkurte_ram · July 24, 2019, 12:31am

Hi @Antoine.C,

Thank you for your detailed response.
Let me digest and try it.

Btw, I have a Mac but I am using Colab environment and using GPU.

Thanks.

Antoine.C · July 24, 2019, 9:37am

By the way, I guess you are trying to reproduce “Creating your own dataset from Google images”. It might be useful to look at its history. I see something about Google Colab, but don’t know if that’s relevant to the issue you have been having.

kandkurte_ram · July 24, 2019, 10:56am

Thanks to both @dusan and @Antoine.C.
I have resolved the issue which was related to Google Drive mount issue.
Now it is working fine.

Antoine.C · July 24, 2019, 12:14pm

Google drive?

Just out of curiosity, and for future reference in case others have the same issue, could you give specifics on how you resolved it?

kandkurte_ram · July 25, 2019, 9:51am

Hi @Antoine.C,

Yes, somehow my Google Drive was not properly mounted.
After following the steps mentioned on page https://course.fast.ai/start_colab.html and it worked like charm.

Thanks

mohanraj.raja · August 31, 2019, 3:41pm

HI @kandkurte_ram i am also facing the same issue

[PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/train.csv’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/test.csv’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/sample_submission.csv’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/test’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/train’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/cleaned.gsheet’),
PosixPath(’/content/mydrive/My Drive/kaggle/competitions/APTOS/input/cleaned.csv’)]

the input/train having my training images and input/test images having my test images

np.random.seed(42)
data = ImageDataBunch.from_folder(posixIP,train=".",valid_pct=2.0,ds_tfms=get_transforms(),size=224,num_workers=4,ignore_empty=True).normalize(imagenet_stats) this the command i executed
but facing error, kindly help me

GotAudio · May 5, 2020, 7:27am

Thanks, you helped me discover this exists;

src = src.label_from_re(pat=pat)