Understanding metrics and callbacks

I’ve been working through the metrics documentation and this post started out as me asking for help with errors and questions about the relationship between metrics classes, metrics functions, and callbacks. In writing the post and doing research for it, I ended up answering my questions for myself after a bit of trial and error. I thought rather than delete the post, I’d leave those answers here in case others have similar questions. (Or if I’ve gotten anything wrong, I’d welcome corrections instead.)

The docs read:

This why in fastai, every metric is implemented as a callback.
If you pass a regular function, the library trnasforms it to a proper
callback called AverageCallback. The callback metrics are only called
during the validation pahse, and only for the following events:

As an example, is here the exact implementation of the AverageMetric
callback that transforms a function like accuracy into a metric callback.

Discovery 1: AverageCallback is not an implemented Callback, as best I can tell. (Running a search for this string in the fast.ai Github repo turns up no results.) I think this is supposed to read AverageMetric instead, and this class is implemented here.

(Also minor Discovery 2: the error_rate metric used for ConvLearner in Lesson 1 is not listed in the docs, but is implemented in the source code, here.

I didn’t quite grok this immediately, so I thought maybe there’d be an example I could work off of in the source code. Here I ran into another point of confusion: there is both an fbeta() function, and an FBeta class defined in metrics.py. The FBeta class does inherit from Callback, but I thought it was somehow supposed to relate to fbeta(), and I couldn’t quite piece that together. After a bit of fiddling around, I am pretty sure that fbeta() is just defined such that it computes the F_beta average for each epoch - since it is a “regular function”, it is implemented as an instance of the AverageMetric class, here. By comparison, FBeta (the class) is implemented so that it can be tracked per-batch (I think?).

(Minor discovery 3: The metrics argument in various Learner initializations only takes functions or class instances, not class definitions. This seems kind of obvious in hindsight, but at first I was trying to pass in metrics=[error_rate, Precision] and getting a TypeError. This is resolved by making an instance first, e.g. precision = Precision(), and then passing precision instead of Precision into the metrics argument.)

It was really helpful for me to track through the source code to see how Learner implements this “let’s see if your metric is a regular function or a Callback instance.” There’s a line in the validate() method of the Learner class that starts this process:

cb_handler = CallbackHandler(self.callbacks + ifnone(callbacks, []), metrics)

Which, in turn, leads to the basic decision point in the CallbackHandler __post_init__:

self.metrics = [(met if isinstance(met, Callback) else AverageMetric(met)) for met in self.metrics]

In other words, if the metric you passed in inherits from Callback, CallbackHandler will just leave it be. If not, it’s treated as an AverageMetric.

So, in summary, if you make your own metrics function, make sure it makes sense to have it treated as an AverageMetric. If not, define your own class which inherits from Callback, going by the rules outlined in the docs, and then pass an instance of that class into the metrics argument for your Learner of choice.

(Note: Nothing really new here above and beyond what’s already in the documentation - I just needed to go through all of the above myself first in order to be able to understand what they were saying.)


cc @sgugger

Confirming discovery 3:

Was trying to use the Fbeta in a multiclass dataset. Had been running into an error passing Fbeta into the metrics param of ConvLearner. Resolved by instantiating first.

fbeta = Fbeta()
learn = ConvLearner(data, models.resnet34, metrics=fbeta)

Maybe someone can help me here: I simply can’t get the f_scores to work with my own data, no matter whether images or other. It works fine in the planets notebook, but even if I copy the exact same stuff over, I always get “mismatch” complaints, that I don’t understand. Training works fine, lr find works fine, metrics like accuracy and/or Precision() get displayed nicely, but as soon as I add any fbeta based metric, it stops working:

this is the specific code right now:

f1_fai = partial(fbeta, thresh=0.2)
#f2_fai = partial(fbeta, beta=2)
learn = tabular_learner(data, layers=[512, 512, 512, 256, ], metrics=[accuracy, f1_fai])
learn.fit_one_cycle(2, 1e-2, wd=0.01)

Error that follows:

And I don’t understand the numbers. bs=8, classes=2, training-examples are table rows with 512 columns. Where does the number 12 even come from??

I am running on fastai v1.0.28

1 Like

fbeta is intended for one-hot-encoded targets (often in a multiclassification problem). Maybe you used it for binary classification problem?

I was trying to record the loss, and accuracy measures of the training.

I followed the instructions here but got the following error. Can anyone share how to get back the training metrics or record them for viewing later?

learn.fit_one_cycle(1, slice(1e-3/(2.6**4),1e-3), moms=(0.8,0.7), callback_fns=[CSVLogger])

NameError Traceback (most recent call last)
----> 1 learn.fit_one_cycle(1, slice(1e-3/(2.6**4),1e-3), moms=(0.8,0.7), callback_fns=[CSVLogger])

NameError: name ‘CSVLogger’ is not defined

I have a related question:
fbeta is a function, which means it uses AverageMetric under the hood.

However, the documentation explains that a metric such as Precision can’t be averaged, and should therefore be implemented as a class. Yet, fbeta requires computing precision and recall (see fbeta implementation)

Can someone explain?

I think you need to do this first (haven’t tested it myself)

import CSVLogger from callbacks.csv_logger

I’m going to answer my own question, hopefully, this will be useful to someone.

As Sylvain mentions in this comment, the implementation of fbeta is indeed inaccurate and should be removed. In the meantime, I think there should be a big warning in the doc, or even print a warning when calling it. Especially since Jeremy uses fbeta in the course lesson 3 2019 in order compare to benchmarks/competitions, I guess many folks will try to use it a get surprising results.

If you want the f beta metric on a multilabel problem, the current class based implementation “FBeta” won’t work either (as it’s designed for single-label), you need to implement your own class.


I am dealing with a binary classification problem. And I am trying to build a sklearn style classification_report containing precision, f1_score and recall. I am using the following code template for all of the three metrics:

fbeta = FBeta(beta=1)
learn.validate(learn.data.valid_dl, metrics=fbeta)

I get a common error:

Hi Everyone. I was able to create an sklearn like classification_report. Here’s my code:

from sklearn.metrics import classification_report 

ground_truth = []
pred_labels = []

for i in range(len(learn.data.valid_ds)):
  temp_pred = str(learn.predict(learn.data.valid_ds[i][0])[0])
  temp_truth = str(learn.data.valid_ds[i]).split('), ', 1)[1].replace('Category ', '').replace(')', '')
assert len(pred_labels) == len(ground_truth)

print(classification_report(ground_truth, pred_labels, target_names=data.classes))

from fastai.callbacks import *

This works. Guaranteed :slight_smile:

wow! thank you so much for this one !!! love it!

now this

from fastai.callback import *

I am not sure where and how to add these metrics

should they go here?
learn = cnn_learner(dls, resnet34, **metrics=Recall** )

This works

learn = cnn_learner(dls, resnet50, metrics=[accuracy, error_rate, Precision(average='micro'),