Precision & Recall: understanding averages

tolitius · March 15, 2019, 7:49pm

I’d like to understand whether/how different averages for “precision” and “recall” are calculated.

I am classifying a single label with multiple classes. When I set averages to “'macro'”, scores seem to make sense:

epoch	train_loss	valid_loss	accuracy	precision	recall	fbeta	time
19	0.655477	0.773050	0.761228	0.767700	0.672442	0.689554	03:15

However my classes are imbalanced, and as far as I understand I should use “'micro'” averages to account for this imbalance. If I switch to “'micro'” though:

learn.metrics=[accuracy,
               Precision(average='micro'),
               Recall(average='micro'),
               FBeta(average='micro')]

all 4 scores (including accuracy) are the same:

epoch	train_loss	valid_loss	accuracy	precision	recall	fbeta	time
5	1.240163	0.991491	0.688006	0.688006	0.688006	0.688006	03:15

(same problem with “'weighted'” averages (recall always == accuracy))

Is that expected?

edit after some reading:

according to this it looks like precision, recall and f score are indeed the same for “'micro'” averages.

So the remaining question is are all these numbers the same as accuracy?
Also is “recall” really the same as accuracy in the case of “'weighted'” averages?

xeTaiz · March 15, 2019, 10:44pm

Let’s take a look
The Precision is the fraction of correct classifications over all cases where you predicted that class:

Precision = #correct A's / #A predictions

if you do a micro average you will end up at

Precision = (#correctA + #correctB + ...)  / (#predA + #predB + ...)
          =           #correct             /        #samples          (Accuracy)

and you can do the very same starting from Recall.

If you do macro, you will in general not see those being equal for Precision

Precision = #correctA / #predA  +  #correctB / #predB  +  ...

(can’t really do much here)

Now what happens with weights in Recall where the weight is #A / #samples, ..., and Recall is #correctA / #A, ...

Recall = (#A / #samples) * #correctA / #A  +  (#B / #samples) * #correctB 
       = #correctA / #samples  +  #correctB / #samples  +  ...
       = (#correctA + #correctB + ...)  #samples     (Accuracy)

There we go already

tolitius · March 16, 2019, 7:29pm

great, that’s helpful. thank you Dominik