Custom metric displaying the wrong numbers during training

I am trying to use a custom metric for my training, but it seems that the numbers displayed during training are not the ones that should be calculated by the custom metrics.

For example, if I run the following:

learn.metrics = [fbeta_mod]
learn.validate(), fbeta_mod(*learn.get_preds(DatasetType.Valid))

I got different results from the validate() function and the ones applying the metric function on the predictions of the validation sets.

Namely, this is the output I get:

([0.8209119, tensor(0.4247)], tensor(0.6222))

If it is of any help, I am using a Tabular learner and I am just trying to use a modified fbeta function on a single class classification task (as the one in fastai.metrics seems not to work out of the box).
This is my modified function (notice that I also changed the default value of the sigmoid parameter.

def fbeta_mod(y_pred:Tensor, y_true:Tensor, thresh:float=0.2, beta:float=2, eps:float=1e-9, sigmoid:bool=False)->Rank0Tensor:
    "Computes the f_beta between `preds` and `targets`"
    beta2 = beta ** 2
    y_pred = y_pred[:, -1]
    if sigmoid: y_pred = y_pred.sigmoid()
    y_pred = (y_pred>thresh).float()
    y_true = y_true.float()
    TP = (y_pred*y_true).sum(dim=-1)
    prec = TP/(y_pred.sum(dim=-1)+eps)
    rec = TP/(y_true.sum(dim=-1)+eps)
    res = (prec*rec)/(prec*beta2+rec+eps)*(1+beta2)
    return res.mean()

the function itself gives me the same results as the fbeta_score in sklearn, so I am pretty sure the right result should be the higher one and the one displayed as a result of validate is wrong.

Any idea?

Edit: if it is of any importance, I am using a custom loss function as well, which is just a weighted cross entropy:

learn.loss_func = CrossEntropyFlat(weight=tensor([w0, 1-w0])))


w0 =; w0



When you do


Your metric is properly computed on all the predictions. But if you use it like this in validate or the training, it computes an average over batches, so it gives a wrong result. That is why, metrics like fbeta for single label classification are computed as Callbacks (you should have a look at FBeta), that stores all the results to be able to compute the right result at the end of validation.

Gotcha. Thanks. I suppose the same hold for accuracy as well. Weird that I had never noticed it

Just for the records, I managed to get the same results with Fbeta this way

learn.metrics = [FBeta()]
learn.validate(), fbeta_mod(*learn.get_preds(DatasetType.Valid), thresh=.5)

Notice that Fbeta uses a .5 threshold (and I am not sure it is possible to change it) while the default of fbeta is .2

There is no threshold parameter no: it takes the classes that has the maximum probability as prediction since it’s single-label.