Bug?: ImageDataBunch.from_folder creates a class called test

ghufran_syed · January 8, 2019, 1:18am

I am working on the dataset from the Kaggle Seedling playground competition. This has the data in a train and test folder, in imagenet format. I want to use valid_pct = 0.2 to create a validation set, then compare with the result on the test set, i.e. the data is as follows:

path\
  train\
    clas1\
    clas2\
    ...
    ...
  test\

np.random.seed(1)
data = ImageDataBunch.from_folder(path = p, 
                                  size = 224,
                                  ds_tfms = get_transforms(),
                                  bs = bs,
                                  valid_pct = 0.2,
                                  train = 'train',
                                  test = 'test')
data = data.normalize()
print(data.classes)
len(data.classes),data.c

gives me the following output:
[‘Black-grass’, ‘Charlock’, ‘Cleavers’, ‘Common Chickweed’, ‘Common wheat’, ‘Fat Hen’, ‘Loose Silky-bent’, ‘Maize’, ‘Scentless Mayweed’, ‘Shepherds Purse’, ‘Small-flowered Cranesbill’, ‘Sugar beet’, ‘test’]
(13, 13)

whereas there should only be 12 classes. I have checked that there is no folder called ‘test’ in the train folder

I’m guessing the function expects either ‘train’ and ‘valid’ and ‘test’ or just ‘train’, but not ‘train’ and ‘test’??

trevor · January 8, 2019, 4:21am

I believe this behavior is caused by the following function, which is ultimately called by ImageDataBunch.from_folder:

def label_from_folder(self, **kwargs)->'LabelList':
    "Give a label to each filename depending on its folder."
    return self.label_from_func(func=lambda o: o.parts[-2], **kwargs)

Training items have the form PosixPath('data/train/Maize/96d476790.png') while test items have the form PosixPath('data/test/1541bdb2e.png'). Since test items do not have a class label preceding them, “test” is interpreted as the class label since it is the second-to-last part of the path.

ritika26 · January 13, 2019, 3:56am

Hi Syed,

I am also working on same dataset. I am also facing the same issue. It is showing 13 classes.

Please suggest how did you fix this issue?

Thanks&Regards,
Ritika

ghufran_syed · January 13, 2019, 3:58am

I actually moved the test set to the directory above, then ran the command again.

Alternatively, I think you could use the datablock API, using individual commands in sequence to get the same result. But moving the test folder to the directory above is the quickest workaround

ritika26 · January 13, 2019, 4:02am

Thanks @ghufran_syed for the quick response.I moved the test folder to the directory above…