Custom metric displaying the wrong numbers during training

I am trying to use a custom metric for my training, but it seems that the numbers displayed during training are not the ones that should be calculated by the custom metrics.

For example, if I run the following:

learn.metrics = [fbeta_mod]
learn.validate(), fbeta_mod(*learn.get_preds(DatasetType.Valid))

I got different results from the validate() function and the ones applying the metric function on the predictions of the validation sets.

Namely, this is the output I get:

([0.8209119, tensor(0.4247)], tensor(0.6222))

If it is of any help, I am using a Tabular learner and I am just trying to use a modified fbeta function on a single class classification task (as the one in fastai.metrics seems not to work out of the box).
This is my modified function (notice that I also changed the default value of the sigmoid parameter.

def fbeta_mod(y_pred:Tensor, y_true:Tensor, thresh:float=0.2, beta:float=2, eps:float=1e-9, sigmoid:bool=False)->Rank0Tensor:
    "Computes the f_beta between `preds` and `targets`"
    beta2 = beta ** 2
    y_pred = y_pred[:, -1]
    if sigmoid: y_pred = y_pred.sigmoid()
    y_pred = (y_pred>thresh).float()
    y_true = y_true.float()
    TP = (y_pred*y_true).sum(dim=-1)
    prec = TP/(y_pred.sum(dim=-1)+eps)
    rec = TP/(y_true.sum(dim=-1)+eps)
    res = (prec*rec)/(prec*beta2+rec+eps)*(1+beta2)
    return res.mean()

the function itself gives me the same results as the fbeta_score in sklearn, so I am pretty sure the right result should be the higher one and the one displayed as a result of validate is wrong.

Any idea?

Edit: if it is of any importance, I am using a custom loss function as well, which is just a weighted cross entropy:

learn.loss_func = CrossEntropyFlat(weight=tensor([w0, 1-w0])))

where

w0 = learn.data.train_ds.y.items.mean(); w0

is

0.2619769418033163

When you do

fbeta_mod(*learn.get_preds(DatasetType.Valid))

Your metric is properly computed on all the predictions. But if you use it like this in validate or the training, it computes an average over batches, so it gives a wrong result. That is why, metrics like fbeta for single label classification are computed as Callbacks (you should have a look at FBeta), that stores all the results to be able to compute the right result at the end of validation.

Gotcha. Thanks. I suppose the same hold for accuracy as well. Weird that I had never noticed it

Just for the records, I managed to get the same results with Fbeta this way

learn.metrics = [FBeta()]
learn.fit(0)
learn.validate(), fbeta_mod(*learn.get_preds(DatasetType.Valid), thresh=.5)

Notice that Fbeta uses a .5 threshold (and I am not sure it is possible to change it) while the default of fbeta is .2

There is no threshold parameter no: it takes the classes that has the maximum probability as prediction since it’s single-label.