Dataloaders() giving list index out of range

Ark52 · September 19, 2020, 8:34am

i am trying to do image classification with full Mnist data set.

i created a data block:
mnist = DataBlock(blocks = (ImageBlock(cls=PILImageBW), CategoryBlock),
get_items = get_image_files,
splitter = GrandparentSplitter(),
get_y = parent_label)

when i try and make a data loader:
dls = mnist.dataloaders(path)

i get a IndexError: list index out of range

then i checked mnist.summary(path) to check the what went wrong! There are 70000 items but the size of the two datasets are shown as zero

What am i doing wrong because it works for the mnist_sample only containing 3s & 7s .

this is the complete error message

vbakshi · September 19, 2020, 7:40pm

Hi @Ark52 I haven’t fully found a solution, but I was able to generate the DataLoaders object once I removed the GrandparentSplitter() so I assume another function for the splitter is needed. Here’s a colab with my code:

https://colab.research.google.com/drive/1fqMZ6xo1JZDNYCnA10pYQfv-KUULSLo8?usp=sharing

vbakshi · September 19, 2020, 7:56pm

@Ark52 the GrandparentSplitter() needs to receive the name of the folders if they are different than “train” and “valid”. For example:

mnist = DataBlock(
        blocks = (ImageBlock(cls=PILImageBW), CategoryBlock),
        get_items = get_image_files,
        splitter=GrandparentSplitter(train_name="training", valid_name="testing"),
        get_y = parent_label)

Ark52 · September 20, 2020, 4:30am

Thanks i will try that

rmkn85 · October 26, 2020, 5:30am

I had this same issue now.

Does this mean the MNIST dataset in fastai bucket is not compatible with default behavior of GrandparentSplitter and the code in train_mnist.ipynb doesn’t work as-is?

Ark52 · October 26, 2020, 5:53am

The default name for train_name is train but tje folder name given in the data set is training same for the valid set.
Give arguments to GrandparentSplitter(train_name=“training”, valid_name= “testing”)

firas.eicon · January 22, 2021, 4:24am

Having the same problem while following this example.

At the part where I create a DataBlock and dataloaders and I get that list index out of range error.

Like @vbakshi getting rid of the splitter=GrandparentSplitter arg allows me to create the dataloaders, but not sure what I should replace that with - is it needed?

In the summary I see that it has found my 2 datasets, ‘train’ (248 items) and valid (62 items) :

Setting-up type transforms pipelines
Collecting items from /home/paperspace/deid/data
Found 310 items
2 datasets of sizes 248,62
...