Not sure if this is a bug. But there is something with the order of the class lables in
text_classifier_learner.data.classes that doesn’t make sense to me. The numerical value of the target class (obtained from
text_classifier_learner.get_preds(DatasetType.Valid)) does not map to the list index in
This list is also used in
class ClassificationInterpretation to label the axis of the confusion matrix and order is important here.
Some code to explain the issue:
# get predictions for validation set
preds = learn.get_preds(DatasetType.Valid)
# get numerical target labels
ytrue = to_np(preds)
for i in range(2):
# string value of target label in data.valid_ds
label_ds = learn.data.valid_ds[i].obj
# numerical value of target class from learn.get_preds(DatasetType.Valid)
label_int = ytrue[i]
# try to get label string value by mapping interger label to list index in learn.data.classes
label_string = learn.data.classes[label_int]
print(label_ds == label_string)
label_string exactly map??
text_classifier_learner.data.classes lists the labels in alphabetical order. But I am still having trouble understanding how the mapping works. Is the order of the labels on both axis of the confusion matrix correct? Why?
I must be getting something wrong. @mchaykow maybe you can help me shed some light on this.
The labels are in alphabetical order yes, to ensure that every time you create your
DataBunch, you get the same mapping.
Note that the predictions with text might be in the wrong order because we sort the samples by length. So you should use
ordered=True when you call
get_preds. Maybe that’s where your problem come from.
Hello - I am with PwC and we want to provide our learners with access to your tools and resources. Would you please provide an actual person’s name and their email address of who we should reach out to? I will have our procurement team reach out once I have the details.
Thanks for the quick response @sgugger! That did the job. It was because of the SortishSampler.
I thought that
order=True was the default behaviour. My bad! Would make sense as default for the test set, though.