ImageDataBunch.from_folder returns IndexError

I am trying to train a model for Devnagari character classfication, on line #9, it gives the error

Traceback (most recent call last):
  File "Model.py", line 9, in <module>
    data = ImageDataBunch.from_folder(path, train = 'train', valid = 'valid', ds_tfms=tfms, size=32)
  File "/usr/lib/python3.8/site-packages/fastai/vision/data.py", line 108, in from_folder
    if valid_pct is None: src = il.split_by_folder(train=train, valid=valid)
  File "/usr/lib/python3.8/site-packages/fastai/data_block.py", line 212, in split_by_folder
    return self.split_by_idxs(self._get_by_folder(train), self._get_by_folder(valid))
  File "/usr/lib/python3.8/site-packages/fastai/data_block.py", line 207, in _get_by_folder
    return [i for i in range_of(self) if (self.items[i].parts[self.num_parts] if isinstance(self.items[i], Path)
  File "/usr/lib/python3.8/site-packages/fastai/data_block.py", line 207, in <listcomp>
    return [i for i in range_of(self) if (self.items[i].parts[self.num_parts] if isinstance(self.items[i], Path)
IndexError: index 0 is out of bounds for axis 0 with size 0

My code is like this:

from fastai import *

from pathlib import Path

from fastai.vision import *

from fastai.metrics import error_rate

bs = 16

path = Path('/home/apostrophie/Documents/DevanagariChars/DevanagariHandwrittenCharacterDataset')

tfms = get_transforms(do_flip=False)

data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=32) 

The dataset structure is like this:

This is my first time to train using a folder and I cannot find a solution even after trying for multiple hours.
Any help would be really appreciated.

Edit:
After some researching, I also tried using this, but to no avail:
data = (ImageList.from_folder(path)
.split_by_folder()
.label_from_folder()
.transform(tfms, size=32)
.databunch())

data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=32)
Try to use this if you have the validation folder, which is named “valid”.

how ImageDataBunch.from_fold aggregate the data? does it join all the data (train, dev and test) together and then go on to do a split using the default 0.2 for train/dev?

ImageDataBunch.from_folder is a standard imagenet style. For example, in your path folder:
path = Path(“your_computer/Number_folder”)
Inside Number_ folder
Number_folder
------------------Number_1
--------------------------1(1).jpg 1(2).jpg
------------------Number_2
-------------------------2(1).jpg 2(2).jpg
------------------Number_3
-------------------------3(1).jpg 3(2).jpg

For this, you don’t need the valid folder if you use “split by random”.
This was taught on lesson2 notebook.

1 Like

@JonathanSum I have a validation folder with the name ‘valid’, yet I cannot get it to work.
The df must include all the image paths with their labels, right?

I am also wondering what is causing that IndexError in my code.

image
image
Do you have these structures in your folder?
If you still have error, I think your path variable to not leading to the folder above the train folder.
Try the fastai part1 lesson notebook, one of it use train vs vaild for training.

I would suggest first trying out this command

data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=32, valid_pct = x)

Set x to whatever validation set split you want. Typically it’s 0.25.
Check if this works then we can try using the valid folder. What the above command does is it picks up the train folder and creates a random validation split from that train set itself.

Also the error message makes it clearer:

  File "/usr/lib/python3.8/site-packages/fastai/vision/data.py", line 108, in from_folder
   ***if valid_pct is None: src = il.split_by_folder(train=train, valid=valid)***

You have not specified a valid_pct and there is no valid folder in your data structure. That is causing the error


It still gives the error with valid_pct = 0.25, so I created a ‘valid’ folder. My dataset looks like the above.

Okay and you are still getting an error with the above folder structure?

Yep, I am.