How to create labels when images are stored into OK/NOK folders, but they're not separated into train/val

Hi all,

I recently started my fastai journey, and so far I’m impressed by the usability of the library! This is the first time I see a DL library which doesn’t even require a labels file - it can build labels automatically! Unfortunately, I have a dataset with a peculiar folder structure. The labels are the folder names:

path\
    OK\
    NOK\

but there are no train or val folders. In other words, I don’t have the folder structure which ImageDataBunch.from_folder would require to work:

path\
  train\
    OK\
    NOK\
  valid\
    OK\
    NOK\

I could build the structure myself, but then I would also have to take care of making a sensible split between train and val, and I would have to take into account the fact that the class NOK is about 20 times rarer than the class OK. I’d rather have fastai to take care of these nuisances :stuck_out_tongue_winking_eye: Is there any other ImageDataBunch generator which might help me?

1 Like

Ok, found the answer already. The parameter valid_pct can be used to handle folders with my structure: it’s a validation set percentage, meaning that the following instruction

data = ImageDataBunch.from_folder(data_dir, valid_pct=0.1, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)

will happily use my two folders to create a training set and a validation set with a 90/10 split. Maybe it could be useful to add a description of the valid_pct argument to the help:

https://docs.fast.ai/vision.data.html#ImageDataBunch.from_folder

1 Like