Possible Bug in 1.0.52: Order of list elements in text_classifier_learner.data.classes

Hey,

Not sure if this is a bug. But there is something with the order of the class lables in text_classifier_learner.data.classes that doesn’t make sense to me. The numerical value of the target class (obtained from text_classifier_learner.get_preds(DatasetType.Valid)) does not map to the list index in text_classifier_learner.data.classes

This list is also used in class ClassificationInterpretation to label the axis of the confusion matrix and order is important here.

Some code to explain the issue:

# get predictions for validation set
preds = learn.get_preds(DatasetType.Valid)
# get numerical target labels
ytrue = to_np(preds[1])

for i in range(2):
    # string value of target label in data.valid_ds
    label_ds = learn.data.valid_ds[i][1].obj
    # numerical value of target class from learn.get_preds(DatasetType.Valid)
    label_int = ytrue[i]
    # try to get label string value by mapping interger label to list index in learn.data.classes
    label_string = learn.data.classes[label_int]
    print(label_ds == label_string)

>>>  False
>>>  False

Shouldn’t label_ds and label_string exactly map??

Seems text_classifier_learner.data.classes lists the labels in alphabetical order. But I am still having trouble understanding how the mapping works. Is the order of the labels on both axis of the confusion matrix correct? Why?

I must be getting something wrong. @mchaykow maybe you can help me shed some light on this.

Thanks

The labels are in alphabetical order yes, to ensure that every time you create your DataBunch, you get the same mapping.
Note that the predictions with text might be in the wrong order because we sort the samples by length. So you should use ordered=True when you call get_preds. Maybe that’s where your problem come from.

Hello - I am with PwC and we want to provide our learners with access to your tools and resources. Would you please provide an actual person’s name and their email address of who we should reach out to? I will have our procurement team reach out once I have the details.

Thanks for the quick response @sgugger! That did the job. It was because of the SortishSampler.

I thought that order=True was the default behaviour. My bad! Would make sense as default for the test set, though.