Incorrect Number of Classes

hi…
My dataset labels has got 5 distinct classes but data.c gives only 1 . What might be the issue.??

Could you be a bit more specific? I presume that data here is a DataBunch. Could you show the steps you used to create it from your data? And in what form is your data? Also, what kind of problem are you working on?

src = (
    CustomImageList.from_df(image_boost_df,PATH,cols='id_code',folder='train_images',suffix='.png')
        .split_by_idx(val_idx)
        .label_from_df(cols='diagnosis')
    )

it has got 0 1,2,3,4 as labels distinctly.
two cols id_code,diagnosis which is label column

LabelLists;

Train: LabelList (4256 items)
x: CustomImageList
Image (3, 1024, 1024),Image (3, 1024, 1024),Image (3, 1024, 1024),Image (3, 1024, 1024),Image (3, 1024, 1024)
y: FloatList
4.0,2.0,3.0,2.0,0.0
Path: …/input;

Valid: LabelList (550 items)
x: CustomImageList
Image (3, 1024, 1024),Image (3, 1024, 1024),Image (3, 1024, 1024),Image (3, 1024, 1024),Image (3, 1024, 1024)
y: FloatList
0.0,2.0,2.0,0.0,4.0

i resolved it…
diagnosis column was going in as float… i changed to Long now the category is category list…

Cool!

Not sure it was going to help, but one thought was that I didn’t see in your code how fastai knows that diagnosis is a categorical variable. If it is a float, then it would make sense to consider the variable as continuous, and “otherwise” categorical. From the documentation it looks like fastai figures out whether a variable is continuous or categorical.

In particular I was going to mention the following paragraph:

_label_cls is the first to be used in the data block API, in the labelling function. If this variable is set to None , the label class will be set to CategoryList , MultiCategoryList or FloatList depending on the type of the first item. The default can be overridden by passing a label_cls in the kwargs of the labelling function.

1 Like

Perhaps more to the point, from the documentation:

Behind the scenes, ItemList.get_label_cls basically select a label class according to the item type of labels , whereas labels can be any of Collection , pandas.core.frame.DataFrame , pandas.core.series.Series . If the list elements are of type string or integer, get_label_cls will output CategoryList ; they are of type float, then it will output FloatList ; if they are of type Collection, then it will output MultiCateogryList .