Not working data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=224)

#1

fastai v1
When I tried
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=224)
came back with index error

My directory structure
Data_folder
– Folder1
– Folder2
– Folder3
– Folder4

IndexError: index 0 is out of bounds for axis 0 with size 0

full error report

IndexError Traceback (most recent call last)
in ()
----> 1 data = ImageDataBunch.from_folder(path, ds_tfms=tfms)

/usr/local/lib/python3.6/dist-packages/fastai/vision/data.py in from_folder(cls, path, train, valid, valid_pct, classes, **kwargs)
114 if valid_pct is None: src = il.split_by_folder(train=train, valid=valid)
115 else: src = il.random_split_by_pct(valid_pct)
–> 116 src = src.label_from_folder(classes=classes)
117 return cls.create_from_ll(src, **kwargs)
118

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in _inner(*args, **kwargs)
256 assert isinstance(fv, Callable)
257 def _inner(*args, **kwargs):
–> 258 self.train = ft(*args, **kwargs)
259 assert isinstance(self.train, LabelList)
260 self.valid = fv(*args, template=self.train.y, **kwargs)

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in label_from_folder(self, **kwargs)
200 def label_from_folder(self, **kwargs)->‘LabelList’:
201 “Give a label to each filename depending on its folder.”
–> 202 return self.label_from_func(func=lambda o: o.parent.name, **kwargs)
203
204 def label_from_re(self, pat:str, full_path:bool=False, **kwargs)->‘LabelList’:

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in label_from_func(self, func, **kwargs)
196 def label_from_func(self, func:Callable, **kwargs)->‘LabelList’:
197 “Apply func to every input to get its label.”
–> 198 return self.label_from_list([func(o) for o in self.items], **kwargs)
199
200 def label_from_folder(self, **kwargs)->‘LabelList’:

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in label_from_list(self, labels, label_cls, template, **kwargs)
176 “Label self.items with labels using label_cls and optionally template.”
177 labels = array(labels, dtype=object)
–> 178 label_cls = self.label_cls(labels, label_cls)
179 y_bld = label_cls if template is None else template.new
180 y = y_bld(labels, **kwargs)

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in label_cls(self, labels, lc)
168 if lc is not None: return lc
169 if self._label_cls is not None: return self._label_cls
–> 170 it = try_int(index_row(labels,0))
171 if isinstance(it, (str,int)): return CategoryList
172 if isinstance(it, Collection): return MultiCategoryList

/usr/local/lib/python3.6/dist-packages/fastai/core.py in index_row(a, idxs)
227 if isinstance(res,(pd.DataFrame,pd.Series)): return res.copy()
228 return res
–> 229 return a[idxs]
230
231 def func_args(func)->bool:

IndexError: index 0 is out of bounds for axis 0 with size 0

0 Likes

Lesson2(v3) IndexError: index 0 is out of bounds for axis 0 with size 0
(Christian Werner) #2

I have the same structure and this works:

data = (ImageItemList.from_folder(path)
        .random_split_by_pct()
        .label_from_folder()
        .transform(tfms, size=224)
        .databunch())
6 Likes

#3

Thanks. Will try it

0 Likes

#4

it worked

1 Like

(U. Aditya Varma) #5

do you know why this is working and the code showed in video not

0 Likes

(Christian Werner) #6

There where quite some API changes in the data bunch area between the lectures and the current fast.ai version…
So some videos are outdated regarding certain API details

0 Likes

#7

Hello do you know if there is a method of setting a parameter like 0.2 when splitting the image? is there any method to replace random_split_by_pct()??

Thanks

0 Likes

(Praneeth Katuri) #8

Even after trying this I have another error:

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py:399: UserWarning: Your training set is empty. Is this is by design, pass ignore_empty=True to remove this warning.
warn(“Your training set is empty. Is this is by design, pass ignore_empty=True to remove this warning.”)
/usr/local/lib/python3.6/dist-packages/fastai/data_block.py:402: UserWarning: Your validation set is empty. Is this is by design, use no_split()
or pass ignore_empty=True when labelling to remove this warning.
or pass ignore_empty=True when labelling to remove this warning.""")

It says my train and validation folders are empty but they’re not, I have 100 images of 2 classes in train and 20 images of 2 classes in valid folders.
Any help would be appreciated

0 Likes

(Rushabh Vasani) #9

Hii ! @NotBad
I had the same issue but I got the solution that I was passing longer path.
And I think that is because of There is always a current directory by default.And we have to spacify the path only after that other wise it will generate this error.It will not say no directory named…
This solved it for me

0 Likes

(Mario jorge lopes chagas de almeida) #10

It worked for me to. I was trying to use MINIST dataset in Colab

mnist = untar_data(URLs.MNIST)
tfms = get_transforms(do_flip=False)

data = (ImageItemList.from_folder(mnist) 
        .random_split_by_pct() 
        .label_from_folder()
        .transform(tfms, size=16)
        .databunch())
1 Like

#11

Hi @RushabhVasani24,
Can you give an example of the change you did to the path ?
Thanks!

0 Likes

(Rushabh Vasani) #12

Hi !!@JustMeSam
I was doing that on ubuntu so first I was specifying the path like path=‘home/rushabh/Desktop/Folder_name’.
And after changing that to path=‘Desktop/Folder_name’ it was working good because ‘home/user_name’ is current directory
in ubuntu and we have to specify the path after that only.
If you are facing problem in google colab then you are probably doing the mistake of specifying the ‘content’ in the path.
Only specifying the ‘Folder_name’ will solve it for you.

0 Likes

#13

Thanks a lot!
I solved it in Google Colab. If anyone is interested, I created a small notebook to demonstrate the problem I had.

1 Like

(Arora) #14

data_cc_new = (ImageList.from_folder(cc) .random_split_by_pct() .label_from_folder() .transform(tfms, size=224) .databunch())

I tried the code but the problem it is splitting the data into 4 classes [Type_1, Type_2, type_3, test] instead of 3 classes [Type_1, Type_2, Type_3]

Why is it creating test class separately.

0 Likes

#15

That’s because it’s taking the test folder in your first call ImageList.from_folder(cc). You should use a filter_by_folder just after to only take [Type_1, Type_2, Type_3].

0 Likes

(CT) #16

Thank you for making this! I tried for hours to find a solution - this worked!

0 Likes

#17

When using data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=224) there has to be a validation set and training set. So if the image data are just in different folders without validation set, the latter has to be created as data = ImageDataBunch.from_folder(path, train='.', valid_pct = 0.2, ds_tfms=tfms, size=24), as shown in lesson2.

0 Likes

(Kai Lichtenberg) #19

What’s the file format of the images?

0 Likes