Accuracy in fastai does not match the one with scikit learn


I’m using fastai to for a text classification problem. I have 45 different classes and at first look, fastai seems to be quite good after only 10 epochs achieving 84.99% accuracy.

epoch train_loss valid_loss accuracy precision recall time
0 0.608150 0.563250 0.836928 nan 0.836928 00:51
1 0.537336 0.555584 0.842027 nan 0.842027 00:56
2 0.494744 0.542400 0.848295 nan 0.848295 00:53
3 0.438984 0.545793 0.849888 nan 0.849888 00:55

First question, I’m using the Precision() metric offered by the metrics module. Tried both micro and macro and both give nan. Any ideas why that is? It’s not too important because for the purpose of this specific task (single label, no threshold), the precision will always be the same as accuracy, but I’m curious why this happens.

My bigger question is that if I run the following :

preds, y = learn.get_preds(ds_type=DatasetType.Valid, ordered=True)
y = torch.argmax(preds, dim=1)

and I then use scikitlearn to get the predictions with:

accuracy_score(np.array(y), valid[['categoryid']].values.flatten())

I get an accuracy of 0.54127 which is very far from the fastai one.

This problem has been bugging me for a while now and I can’t seem to find an explanation.

Morever, if I analyse the results given by fastai further, I get results that should match the 84.9% accuracy.

interp = TextClassificationInterpretation.from_learner(learn)
x = interp.most_confused()
s = 0
for i in x:
    s += i[2]
print("Total number of wrong classifications:", s)
acc = 1 - 1413/len(valid)
print("Accuracy based on this:", acc)

Prints out:

Total number of wrong classifications: 1413
Accuracy based on this: 0.8498884521406566

Any insights would be appreciated!

I experience a similar thing where the accuracy displayed during training is much different than accuracy measured afterwards. However, my accuracy during training is lower than afterwards (opposite if your situation), which I suspect is due to dropout not being disabled for the validation set. I’m not sure that this theory would explain your accuracy afterwards being lower though.

I feel like for my case there is some kind of shuffling going on but I forced ordered=True so I don’t understand what is happening…

@sgugger I never got an answer for this issue, do you mind having a look?

You are not using the same function, so you don’t get the same results. I have no idea what accuracy_score does in the non-obvious cases, what shapes your tensors are or what your problem is since you did not share how you created your data.

The fastai accuracy doesn’t match the scikit learn accuracy for multi-label classification problems. Here is a simple example:

preds = tensor([0.4, 0.6],
               [0.8, 0.7],
ys = tensor([1, 1],
            [1, 1],
accuracy_thresh(preds, ys, thresh=0.5, sigmoid=False)
>>> tensor(0.7500)

from sklearn.metrics import accuracy_score
accuracy_score(preds > 0.5, ys)
>>> 0.5

Scikit learn is treating the observations as wholly correct or incorrect with no score improvement for partial correctness as is the case for the first observation. I wasn’t able to replicate this for classification problems. Perhaps you could provide a sample of your data or a executable example.

I was doing single label multi-class classification so that was not really the issue. The accuracy for these tasks is usually straightforward and computed the same way everywhere.
However, I finally figured out what I was doing wrong.

I had a pandas dataframe that had three columns : text, category, and category_id, where category was a string that represented the class of the input whereas category_id was an integer also representing the class (had values between 0 and 44 since I have 45 different classes).

The problem was that I was training on the textual category targets and using the scikit learn accuracy to compare the predictions with the category_id column (because I didn’t know how to map the integer predictions that get_preds() outputs back into category_id strings). I guess fastai’s numericalization of labels was not the same as the original class mappings I had. Some mapping were the same but not all of them, which is why I don’t get a random accuracy but neither do I get same as the fastai accuracy.

Suppose I was only working with textual labels, and I hadn’t done my own mapping of the textual labels column: is there a method that can reverse the mapping (of string labels to integer labels) that I get when calling preds, y = learn.get_preds(ds_type=DatasetType.Valid)? Something similar to the inverse_transform() method of scikit’s LabelEncoder?

You can get the class to index mapping from You could use this dictionary to reverse the label mapping, I’m not aware of any helper methods.

1 Like

Thanks, that’s what I was looking for!