I’d like to understand whether/how different averages for “precision” and “recall” are calculated.

I am classifying a single label with multiple classes. When I set averages to “`'macro'`

”, scores seem to make sense:

epoch | train_loss | valid_loss | accuracy | precision | recall | fbeta | time |
---|---|---|---|---|---|---|---|

19 | 0.655477 | 0.773050 | 0.761228 | 0.767700 | 0.672442 | 0.689554 | 03:15 |

However my classes are imbalanced, and as far as I understand I should use “`'micro'`

” averages to account for this imbalance. If I switch to “`'micro'`

” though:

```
learn.metrics=[accuracy,
Precision(average='micro'),
Recall(average='micro'),
FBeta(average='micro')]
```

all 4 scores (including accuracy) *are the same*:

epoch | train_loss | valid_loss | accuracy | precision | recall | fbeta | time |
---|---|---|---|---|---|---|---|

5 | 1.240163 | 0.991491 | 0.688006 | 0.688006 | 0.688006 | 0.688006 | 03:15 |

*(same problem with “ 'weighted'” averages (recall always == accuracy))*

Is that expected?

*edit after some reading:*

according to this it looks like precision, recall and f score are indeed the same for “`'micro'`

” averages.

So the remaining question is are all these numbers the same as *accuracy*?

Also is “recall” really the same as *accuracy* in the case of “`'weighted'`

” averages?