ImageDataBunch.from_folder issue

mazen · May 5, 2019, 7:00pm

Hi there,

I am working with a cars dataset “Vehicle Make and Model Recognition Dataset (VMMRdb)” for make and model recognition, it contains 9171 classes. When I use ImageDataBunch to prepare the data I get this problem

data = ImageDataBunch.from_folder(path, valid_pct=0.3,
        ds_tfms=get_transforms(do_flip=False, flip_vert=True, max_rotate=5.0, max_zoom=1.1, max_lighting=0.2, max_warp=0.2, p_affine=0.75, p_lighting=0.75), bs = 32,size=224, num_workers=4).normalize(imagenet_stats)

====================================================================
/opt/anaconda3/lib/python3.7/site-packages/fastai/data_block.py:522: UserWarning: You are labelling your items with CategoryList.
Your valid set contained the following unknown labels, the corresponding items have been discarded.
gmc_jimmy_2010, porsche_panamera_2010, gmc_c6500_2000, ford_e350_1992, chrysler_voyager_1996…
if getattr(ds, ‘warn’, False): warn(ds.warn)

So when I execute
len(data.classes)
I get a number of classes that always changes when I re-execute the ImageDataBunch.from_folder command but never the true number 9171.

I don’t know the reason why I would appreciate your help.

ste · May 5, 2019, 11:34pm

Probably due to your train/valid split (pct =.3): you’re randomly splitting your data, so it’s possible that few classes never appears in your “train” data.
From the docs:

CategoryList ( items : Iterator [ T_co ], classes : Collection [ T_co ]= None , label_delim : str = None , **** kwargs** ) :: CategoryListBase

You should pass the “classes” parameter, listing all available classes (9171 items).

mazen · May 6, 2019, 2:05pm

Thanks a lot!
I have solved the problem by specifying classes = list of classes