I’d like to understand whether/how different averages for “precision” and “recall” are calculated.
I am classifying a single label with multiple classes. When I set averages to “'macro'
”, scores seem to make sense:
epoch | train_loss | valid_loss | accuracy | precision | recall | fbeta | time |
---|---|---|---|---|---|---|---|
19 | 0.655477 | 0.773050 | 0.761228 | 0.767700 | 0.672442 | 0.689554 | 03:15 |
However my classes are imbalanced, and as far as I understand I should use “'micro'
” averages to account for this imbalance. If I switch to “'micro'
” though:
learn.metrics=[accuracy,
Precision(average='micro'),
Recall(average='micro'),
FBeta(average='micro')]
all 4 scores (including accuracy) are the same:
epoch | train_loss | valid_loss | accuracy | precision | recall | fbeta | time |
---|---|---|---|---|---|---|---|
5 | 1.240163 | 0.991491 | 0.688006 | 0.688006 | 0.688006 | 0.688006 | 03:15 |
(same problem with “'weighted'
” averages (recall always == accuracy))
Is that expected?
edit after some reading:
according to this it looks like precision, recall and f score are indeed the same for “'micro'
” averages.
So the remaining question is are all these numbers the same as accuracy?
Also is “recall” really the same as accuracy in the case of “'weighted'
” averages?