Bug?: ImageDataBunch.from_folder creates a class called test

I am working on the dataset from the Kaggle Seedling playground competition. This has the data in a train and test folder, in imagenet format. I want to use valid_pct = 0.2 to create a validation set, then compare with the result on the test set, i.e. the data is as follows:

path\
  train\
    clas1\
    clas2\
    ...
    ...
  test\
np.random.seed(1)
data = ImageDataBunch.from_folder(path = p, 
                                  size = 224,
                                  ds_tfms = get_transforms(),
                                  bs = bs,
                                  valid_pct = 0.2,
                                  train = 'train',
                                  test = 'test')
data = data.normalize()
print(data.classes)
len(data.classes),data.c

gives me the following output:
[‘Black-grass’, ‘Charlock’, ‘Cleavers’, ‘Common Chickweed’, ‘Common wheat’, ‘Fat Hen’, ‘Loose Silky-bent’, ‘Maize’, ‘Scentless Mayweed’, ‘Shepherds Purse’, ‘Small-flowered Cranesbill’, ‘Sugar beet’, ‘test’]
(13, 13)

whereas there should only be 12 classes. I have checked that there is no folder called ‘test’ in the train folder :slight_smile:

I’m guessing the function expects either ‘train’ and ‘valid’ and ‘test’ or just ‘train’, but not ‘train’ and ‘test’??

I believe this behavior is caused by the following function, which is ultimately called by ImageDataBunch.from_folder:

def label_from_folder(self, **kwargs)->'LabelList':
    "Give a label to each filename depending on its folder."
    return self.label_from_func(func=lambda o: o.parts[-2], **kwargs)

Training items have the form PosixPath('data/train/Maize/96d476790.png') while test items have the form PosixPath('data/test/1541bdb2e.png'). Since test items do not have a class label preceding them, “test” is interpreted as the class label since it is the second-to-last part of the path.

Hi Syed,

I am also working on same dataset. I am also facing the same issue. It is showing 13 classes.

Please suggest how did you fix this issue?

Thanks&Regards,
Ritika

I actually moved the test set to the directory above, then ran the command again.

Alternatively, I think you could use the datablock API, using individual commands in sequence to get the same result. But moving the test folder to the directory above is the quickest workaround :slight_smile:

1 Like

Thanks @ghufran_syed for the quick response.I moved the test folder to the directory above…