ImageDataBunch.from_csv method always models a MultiCategory classification problem

Hello friends!

First of all thanks for this amazing course. Hats off to Jeremy and his team for this awesome project!!

Now the issue:

I’m trying to do single class classification for skin lession diagnosis.
I have a csv which has two columns. The first is the name of the image. The second one the name of the lession.
For example
|image, label
ISIC_0000000, NV
ISIC_0000001, NV
ISIC_0000002, MEL

When I do

data = ImageDataBunch.from_csv(path = tesisPath,
                           folder=trainImagesFolderName, 
                           label_delim=',',
                           ds_tfms=get_transforms(), 
                           size=64,
                           bs=bs,
                           csv_labels=trainingGroundTruthFileName, 
                           suffix=".jpg", 
                           valid_pct=0.2, 
                           header=0,
                           fn_col=0,
                           label_col=1).normalize(imagenet_stats)

and then I list my ds with

data.valid_ds

I get

LabelList (5066 items)
x: ImageList
Image (3, 64, 64),Image (3, 64, 64),Image (3, 64, 64),Image (3, 64, 64),Image (3, 64, 64)
y: MultiCategoryList
NV,BKL,NV,NV,NV
Path: drive/My Drive/tesis/2019

But I want y to be a single category for each image!
Is there any way to change it? Or something I’m doing wrong?

As it is now I can’t use accurracy as a metric because it fails on the validation part.

Don’t pass label_delim, that is to delimit labels, not a general CSV delimiter. As you passed it in, fastai is assuming MultiCategoryList.
You can see the relevant code in ImageList._label_from_list. You may also want to switch to using the data block API directly rather than the ImageDataBunch.from_ methods which are just meant for basic use (so it’s not obvious in cases like this what parts of the process various parameters affect). You can see pretty much the equivalent code you’d use in ImageDataBunch.from_df (after loading the CSV to a dataframe).

Oh thanks!
Yes I was looking at the code and got lost with the indirections at _label_from_list! Now that you point to it I see it clearly :slight_smile:

So basically what I thought was the delimiter passed to pandas to process the csv, is really the multilabel separation.

Thanks for helping!