IndexError: index out of bounds ImageDataBunch.from_folder

ermonsi · February 5, 2019, 6:02pm

My dataframe looks like this:

website         num_rate1 num_rate2  mean          val                             
english_0         3468     1556   2.228792        False
english_1         6394     1546   4.135834        False
english_2        6075     1571   3.866964        False
english_3       7760     1566   4.955300        False

my folder looks like this:
/img
/train:
/0
/1
…

When I run the following code I run into out of index error. Anyone has any idea how can I pass image data from folder?

data = ImageDataBunch.from_folder(path, ignore_empty=True)

data = (ImageItemList.from_folder(path)
        .random_split_by_pct()
        .label_from_folder()
        .transform(tfms, size=224)
        .databunch())'''

IndexError: index 0 is out of bounds for axis 0 with size 0

Jarmos · March 17, 2019, 5:35pm

Any luck figuring a way out with this problem? I’m surprised there’s no solution to this problem anywhere! Even the forums doesn’t have much mention about it.

Ashka · March 20, 2019, 7:19pm

Commenting to give it more visibility…

sgugger · March 20, 2019, 9:38pm

This has been mostly ignored because no one can help without seeing the full code, the full stack trace as well as the structure of the directory used.

Nobody on the forum is a magician, so if you want help, you have to give all the info necessary to reproduce the bug you encountered

howardroark_10 · March 21, 2019, 7:25pm

i got the similar error when i tried to split the kaggle titanic dataset as shown in tabular lesson
of v3
but it works if i use split_from_rand_pct()
i faced this error many times when i used datasets of kaggle
i saw the same problem in many discussions
how do we know where we are doing wrong?

Ashka · March 21, 2019, 8:20pm

Maybe as sgugger suggested post the code and the entire error message so we can help. Just something that I realized is for me two things were wrong: First it really expects a train test and valid folder, I only had train and test since I assumed valid would be created from valid_prct, but I also had image format issues. Maybe it could help someone

nanthakumar · October 30, 2019, 2:08am

train = ‘C:/Users/nanth/PycharmProjects/coco/coco_dataset/train’
valid = ‘C:/Users/nanth/PycharmProjects/coco/coco_dataset/valid’
this is my train and valid local path

C:\Users\nanth\Anaconda3\lib\site-packages\fastai\data_block.py:454: UserWarning: Your training set is empty. If this is by design, pass `ignore_empty=True` to remove this warning.
warn(“Your training set is empty. If this is by design, pass `ignore_empty=True` to remove this warning.”)
C:\Users\nanth\Anaconda3\lib\site-packages\fastai\data_block.py:457: UserWarning: Your validation set is empty. If this is by design, use `split_none()`
or pass `ignore_empty=True` when labelling to remove this warning.
or pass `ignore_empty=True` when labelling to remove this warning.""")

IndexError Traceback (most recent call last)
in
1 data = (PointsItemList.from_folder(path)
2 .split_by_folder(train = train, valid = valid)
----> 3 .label_from_func(convert_biwi)
4 .transform(get_transforms(), tfm_y=True, size=(120,160))
5 .databunch().normalize(imagenet_stats)

~\Anaconda3\lib\site-packages\fastai\data_block.py in _inner(*args, **kwargs)
478 self.valid = fv(*args, from_item_lists=True, **kwargs)
479 self.class = LabelLists
–> 480 self.process()
481 return self
482 return _inner

~\Anaconda3\lib\site-packages\fastai\data_block.py in process(self)
531 def process(self):
532 “Process the inner datasets.”
–> 533 xp,yp = self.get_processors()
534 for ds,n in zip(self.lists, [‘train’,‘valid’,‘test’]): ds.process(xp, yp, name=n)
535 #progress_bar clear the outputs so in some case warnings issued during processing disappear.

~\Anaconda3\lib\site-packages\fastai\data_block.py in get_processors(self)
526 procs_x,procs_y = listify(self.train.x._processor),listify(self.train.y._processor)
527 xp = ifnone(self.train.x.processor, [p(ds=self.train.x) for p in procs_x])
–> 528 yp = ifnone(self.train.y.processor, [p(ds=self.train.y) for p in procs_y])
529 return xp,yp
530

~\Anaconda3\lib\site-packages\fastai\data_block.py in (.0)
526 procs_x,procs_y = listify(self.train.x._processor),listify(self.train.y._processor)
527 xp = ifnone(self.train.x.processor, [p(ds=self.train.x) for p in procs_x])
–> 528 yp = ifnone(self.train.y.processor, [p(ds=self.train.y) for p in procs_y])
529 return xp,yp
530

~\Anaconda3\lib\site-packages\fastai\vision\data.py in init(self, ds)
391 class PointsProcessor(PreProcessor):
392 "PreProcessor that stores the number of targets for point regression."
–> 393 def init(self, ds:ItemList): self.c = len(ds.items[0].reshape(-1))
394 def process(self, ds:ItemList): ds.c = self.c
395

IndexError: index 0 is out of bounds for axis 0 with size 0

i got this error i don’t know where it is coming form. Please any one help me do to this.

enigma6174 · March 19, 2020, 5:04am

Hi, try this solution (it works for me, when the dataset is already broken down into train and validation sets):

from pathlib import Path

path = Path('C:/Users/nanth/PycharmProjects/coco/coco_dataset')

data = ImageDataBunch.from_folder(path, train='train', valid='valid', ds_tfms=get_transforms(), size=224, bs=64)
data = data.normalize(imagenet_stats)

data.show_batch(rows=3, figsize=(7,6))

Few Points To Remember:

If you have a test set and train set then just pass the test folder to your valid parameter in the function call
If the train sets and validation sets are named differently then use those names in the respective parameters
For this approach to work, you have to make sure that there is not another folder called train or valid inside the original train or valid folders. There should be class folders directly under the train or valid folders; if that is not the case, restructure the folder(s) accordingly.
For example :

/home/user/Dev/data/fruit-classification/train/train/…
/home/user/Dev/data/fruit-classification/valid/valid/…

In the above, you can see that there is a train folder inside another train folder and the inner train folder actually contains the classes. In such a case, move the inner train folder to the the same level as the outer train folder and then delete the outer train folder. The same applies to the test or validation folders.
A lot of the Kaggle datasets are structured like this so this is an important thing to take note of when you are trying ImageDataBunch.from_folder(…) on them.
After doing all these steps if it still fails I can give you a debugging tip right now : Check how to define path names in Windows. I use Ubuntu so the path structure (slashes etc.) are different.

Hope this helps! Cheers!!