Difference between printed metrics and actual metrics


I’m running the following code:

f_score = FBeta(average=‘macro’, beta=1)
learn_tsh = text_classifier_learner(data_task_specific_head, AWD_LSTM, metrics=[accuracy, f_score])
learn_tsh.fit_one_cycle(1, max_lr=1e-2, moms=(0.95,0.85))

Which produces this output:

Total time: 00:47

epoch train_loss valid_loss accuracy f_beta time
0 0.636780 0.601293 0.674651 0.485052 00:47

If I check on the validation data (note that validation loss and accuracy are correct):

results = learn_tsh.validate(learn_tsh.data.valid_dl)

[0.60129267, tensor(0.6747), tensor(0.5832)]

So why is the printed value (table output) for FBeta different from the actual results, when it’s correct for the “builtin” things like accuracy and validation loss? I got a bit lost in the code trying to follow what gets printed where :confused:

My fast.ai library version is 1.0.52-1 (should be tha latest as of this writing).

Upon further investigation, it seems that the values are incorrect once the average parameter is something other than micro. No parameter defaults to micro.

f_score = FBeta(average=‘macro’, beta=1) # wrong
f_blank = FBeta() # correct
f_blank2 = FBeta(beta=1) # correct
f_micro = FBeta(average=‘micro’) # correct
f_macro = FBeta(average=‘macro’) # wrong
f_weighted = FBeta(average=‘weighted’) # wrong

epoch train_loss valid_loss accuracy f_beta f_beta f_beta f_beta f_beta f_beta time
0 0.629249 0.604950 0.668663 0.545243 0.847554 0.782152 0.668663 0.547606 0.645194 00:52


My guess would be that very something wrong in the callbacks calls in validate, which screws up the metric somehow. I’ll look into it when I have a bit of time.