F1 Score as metric

Yes, it worked with Fbeta_binary. Failed using FBeta(beta=1). I have 2 classes: 0,1
These are the relevant error lines:

 1 learnc0.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))   [my code]

/usr/local/lib/python3.6/dist-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
     20     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
     21                                         pct_start=pct_start, **kwargs))
---> 22     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
....
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    170         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
    171         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 172             callbacks=self.callbacks+callbacks)
....
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     92     except Exception as e:
     93         exception = e
---> 94         raise e
...
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     87             if not data.empty_val:
     88                 val_loss = validate(model, data.valid_dl, loss_func=loss_func,
---> 89                                        cb_handler=cb_handler, pbar=pbar)
...
/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py in validate(model, dl, loss_func, cb_handler, pbar, average, n_batch)
     52             if not is_listy(yb): yb = [yb]
     53             nums.append(yb[0].shape[0])
---> 54             if cb_handler and cb_handler.on_batch_end(val_losses[-1]): break
...
/usr/local/lib/python3.6/dist-packages/fastai/callback.py in on_batch_end(self, loss)
    237         "Handle end of processing one batch with `loss`."
    238         self.state_dict['last_loss'] = loss
--> 239         stop = np.any(self('batch_end', not self.state_dict['train']))
...
/usr/local/lib/python3.6/dist-packages/fastai/callback.py in __call__(self, cb_name, call_mets, **kwargs)
    185     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    186         "Call through to all of the `CallbakHandler` functions."
--> 187         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
...
/usr/local/lib/python3.6/dist-packages/fastai/callback.py in <listcomp>(.0)
    185     def __call__(self, cb_name, call_mets=True, **kwargs)->None:
    186         "Call through to all of the `CallbakHandler` functions."
--> 187         if call_mets: [getattr(met, f'on_{cb_name}')(**self.state_dict, **kwargs) for met in self.metrics]
...
/usr/local/lib/python3.6/dist-packages/fastai/metrics.py in on_batch_end(self, last_output, last_target, **kwargs)
    116     def on_batch_end(self, last_output:Tensor, last_target:Tensor, **kwargs):
    117         preds = last_output.argmax(-1).view(-1)
--> 118         cm = ((preds==self.x[:, None]) & (last_target==self.x[:, None, None])).sum(dim=2, dtype=torch.float32)

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'other'
1 Like

Ok fixed it. A pull request has been submitted #1416 @sgugger.

3 Likes

I am also having the same problem. My dataset has 5 classes and working fine with accuracy but it is showing the same error with fbeta metric. Have you got it fixed?

hmmmmmā€¦ it seems like this issue is still persisting. Using fbeta as a metrics for a multi-classification problem gives me the following error:
The size of tensor a (24) must match the size of tensor b (16) at non-singleton dimension 1

Yes, I also get a similar error.

I also get The size of tensor a (2) must match the size of tensor b (64) at non-singleton dimension 1 when using fbeta metric (fastai 1.0.45). Binary classification.
Edit: should have used FBeta() for binary.

2 Likes

Has anyone solved this issue? I am trying to use
learn = tabular_learner(data, layers=[50,10], ps=[0.1,0.1], emb_drop=0.1,
metrics=[accuracy, fbeta])

Try metrics=[Fbeta(1)]. Also you may want to check out the docs.

1 Like

So just to clarify, fbeta is a metric intended for multiclass+multilabel, so weā€™d need to adapt it for single label multiclass, not just for binary classification, right?

fbeta is for multilabel, Fbeta is for singlelabel and can handle multiclass (check the various possible modes).

3 Likes

Ah perfect, so I can just use Fbeta instead of creating my own. Thanks!

Is it possible to pass ā€˜weightedā€™ as an argument to the ā€˜averageā€™ parameter for the fbeta function (just like in the sklearn library)?

See their documentation.

In my own project on image classification the following seemed to work:

my_fbeta = FBeta(average='macro')
learn.metrics = [accuracy, my_fbeta]

Passing ā€˜averageā€™ didnā€™t work, it result in nan. Iā€™m interested in average because of an imbalanced dataset.

Iā€™m sure this is not the best way to achieve this, but this worked for me.

@dataclass
class F1(Callback):
    def on_epoch_begin(self, **kwargs):
        self.y_pred = torch.tensor([]).cuda()
        self.y_true = torch.tensor([]).cuda()

    def on_batch_end(self, last_output, last_target, **kwargs):
        self.y_pred = torch.cat((self.y_pred, last_output.argmax(dim=1).float()))
        self.y_true = torch.cat((self.y_true, last_target.float()))

    def on_epoch_end(self, **kwargs):
        self.metric = sklearn.metrics.f1_score(self.y_true, self.y_pred, average='weighted')

Ref: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html

1 Like

I am trying to use fastaiā€™s FBeta class in a sequence labeling problem:

learn = Learner(data, rnn, loss_func=seq2seq_loss, metrics=FBeta())

When I try to learn.fit_one_cycle(4, 1e-2), I get this error: RuntimeError: The size of tensor a (75200) must match the size of tensor b (1175) at non-singleton dimension 2.

Note: My batch size is 64; the tensor a (75200) is 64 times the tensor b (1175).

I tried using FBeta(average="micro", beta=2), but got the same error.

When I try to use partial(fbeta, beta=2) in metrics instead, I get the following error when fitting: RuntimeError: The size of tensor a (24) must match the size of tensor b (1175) at non-singleton dimension 2

My model looks like this:

NERRNN(
  (enc): Embedding(9472, 400, padding_idx=1)
  (enc_drop): Dropout(p=0.15)
  (out): Linear(in_features=400, out_features=24, bias=True)
)

Any help would be much appreciated.

I was confused a bit about F1 scores due to some historical code but thought Iā€™d post what worked for me in case it wasnā€™t clear to others.

I am trying to measure F1 score for the dataset/competition from https://www.kaggle.com/c/quora-insincere-questions-classification. It has a binary classification of 0 or 1.

Iā€™m running 1.0.57 and this is what worked for me:

learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)
learn.metrics = [FBeta(beta=1)]

Using partial(fbeta,beta=1) didnā€™t work, and neither did Fbeta_binary.

2 Likes

@elichen running into the same issue on the same kaggle competition, were you able to figure out how to implement the F1 on this?

Yes, it was by using

learn.metrics = [FBeta(beta=1)]
1 Like

How to find classification report like F1 score and precision recall in this after confusion matrix

Thank you very much for your advice and notebook!
Iā€™m working with imbalanced tabular dataset (80% zeros and 20% ones) and using tabularlearner. Accurate prediction of ones is more important for me than overall accuracy, so I tried binary fbeta for ones as @wyquek has written:

learn.metrics=[accuracy, Fbeta_binary(beta2=1,clas = 1)]

However, for the sake of accuracy, my learner tends to increase false negatives and I donā€™t get the result I need.
How can I reduce false negatives and improve recall?

(With SMOTE multiple sklearn classifiers showed better recall, but for some reason it decreased to 0 with Fastai tabular learner fitted on SMOTE dataset).

Iā€™d highly appreciate your feedback.