Accuracy calculation problem

I am trying to reproduce the reported accuracy (in particular for the imdb classification notebook:

log_preds = learn.predict()
probs = np.exp(log_preds)
y_true =
y_pred = np.argmax(probs, axis=1)
acc = (y_true == y_pred).mean()
print('Accuracy:', acc)

Thus calculated accuracy on the validation set is much smaller than the accuracy reported by the learner, e.g.:

epoch      trn_loss   val_loss   accuracy                      
    0      0.218912   0.163022   0.9412    
    1      0.20981    0.169191   0.94068                       
    2      0.17804    0.157355   0.94472           

but the code above gives 0.5006 :frowning:

Any idea what is wrong? I would like to calculate different performance metrics like recall, precision, confusion matrix, etc. for multiclass problem.

If you mean the IMDB sentiment, given your accuracy is around 0.5 you’re looking at a random accuracy. Most likely the order of your dataset is getting mixed up somewhere and you’re comparing to the wrong prediction.

Yes, IMDB sentiments. I run the notebook without other changes. Just adding the code for the accuracy evaluation at the bottom. Can someone else try this?

Solved - it looks like the problem is in the sampling of the dataset: val_dl = DataLoader(val_ds, bs, transpose=True, num_workers=1, pad_idx=1, sampler=val_samp).

The following code with removed sampler gives the expected results:

val_dl = DataLoader(val_ds, bs, transpose=True, num_workers=1, pad_idx=1)
log_preds = predict(learn.model, val_dl)
y_pred = np.argmax(log_preds, axis=1)
y_true = md.val_y
print('Accuracy:', (y_pred == y_true).mean() )

