ImageDataBunch.from_df doesn't take classes

I’m using fastai for inference, so I have few labels for each image. The method ‘from_df’ doesn’t take the number of classes so a lot of the images/labels get ignored when I train. Is there a way to pass in the set of classes?

This is the warning I get:

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py:525: UserWarning: You are labelling your items with CategoryList.
Your valid set contained the following unknown labels, the corresponding items have been discarded.

hey, you have to create imageitemlist before doing imagedatabunch. Since your problem is a multi-label image classification problem. Your image item list will probably look like this.

src = (ImageItemList.from_csv(path, ‘train_v2.csv’, folder=‘train-jpg’, suffix=’.jpg’)
.random_split_by_pct(0.2)
.label_from_df(label_delim=’ ')#if your images labels are seperated by space
)
Then do the databunch

data = (src.transform(tfms, size=128)
.databunch(bs=64).normalize(imagenet_stats)
)
Hope it fixes the problem. If not, please share your notebook. I will look into it.

Thank you, I think the issue is that I have few images for each label, yet thousands of labels. I’m training in a distributed fashion, so it would be helpful if I could pass in the set of labels, instead of extracting them from a csv/dataframe, as the chunk that each machine gets will most likely get only a small subset of the labels. I’m not sure whether this is fine, but I feel like this could affect the embeddings I extract.