F1 Score as metric

sgugger · December 7, 2018, 2:16am

Didn’t have time to check that issue yet, so I’m not sure.

ajan1019 · December 7, 2018, 6:21am

@wyquek I tried to use Fbeta_binary. I got below error.

NameError: name ‘clas’ is not defined

wyquek · December 7, 2018, 6:41am

try this

@dataclass
class Fbeta_binary(Callback):
    "Computes the fbeta between preds and targets for single-label classification"
    beta2: int = 2
    eps: float = 1e-9
    clas:int=1
    
    def on_epoch_begin(self, **kwargs):
        self.TP = 0
        self.total_y_pred = 0   
        self.total_y_true = 0
    
    def on_batch_end(self, last_output, last_target, **kwargs):
        y_pred = last_output.argmax(dim=1)
        y_true = last_target.float()
        
        self.TP += ((y_pred==self.clas) * (y_true==self.clas)).float().sum()
        self.total_y_pred += (y_pred==self.clas).float().sum()
        self.total_y_true += (y_true==self.clas).float().sum()
    
    def on_epoch_end(self, **kwargs):
        beta2=self.beta2**2
        prec = self.TP/(self.total_y_pred+self.eps)
        rec = self.TP/(self.total_y_true+self.eps)       
        res = (prec*rec)/(prec*beta2+rec+self.eps)*(1+beta2)
        self.metric = res

If you want F1 for label 1

learn = text_classifier_learner(data_clas, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
learn.metrics=[accuracy, Fbeta_binary(beta2=1,clas = 1)]

OR if you want F1 for label 0

learn = text_classifier_learner(data_clas, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
learn.metrics=[accuracy, Fbeta_binary(beta2=1,clas = 0)]

OR if you want F1 for both label 1 and label 0

learn = text_classifier_learner(data_clas, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
f1_label1 = Fbeta_binary(1,clas = 0)
f1_label0 = Fbeta_binary(1,clas = 1)
learn.metrics=[accuracy, f1_label1,f1_label0]

Here’s a notebook example.

I think there are lots of metrics such as these mentioned in this PR that forummers can help fastai build, but they most probably have to be written as callbacks

ajan1019 · December 7, 2018, 6:46am

It worked. Thank you so much

nikhil_no_1 · December 24, 2018, 11:05am

Basic question, how does loss minimization happen in case of multiple metrics?
I am working on a text classification problem which is showing high accuracy but not predicting correctly when I am inferring?
I am doubting it is because I am using accuracy (default) as a metric. Now changed to fbeta_binary and checking.

wyquek · December 24, 2018, 11:30am

My understanding is that minimizing loss will lead to better metric but up to a point, and beyond that the particular metric would start to get worse as the NN overfits. So it’s possible that, for metric F1, the over-fitting would start at, say, epoch 25, while for another metric, accuracy, the overfitting would start at epoch 15. So you can’t train the NN such that both metrics are at their best, unless by coincidence. Most likely you would have to choose one.

nikhil_no_1 · December 24, 2018, 11:44am

Is it just the no. of epochs and over-fitting? Coz it would then mean the training happens in exactly same way regardless of which metric is used.
My understanding is that if a different metric function is used then calculated loss will be different which would lead to a completely different path during training. Isn’t that the case?

BTW, I used your approach. It works but I don’t see any output. I went back to only having accuracy and even that didn’t display the valid_loss and accuracy values which I get when I don’t set learn.metrics

learn = text_classifier_learner(data_clas, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
f1_label1 = Fbeta_binary(1,clas = 0)
f1_label0 = Fbeta_binary(1,clas = 1)
learn.metrics=[accuracy]
#learn.metrics=[f1_label1,f1_label0]
learn.freeze()

learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))
Total time: 09:27
epoch	train_loss	valid_loss	accuracy
1	0.145246

Any idea? Thanks.

wyquek · December 24, 2018, 1:09pm

Isn’t that the case?

Yes, that’s what I meant as well.

Any idea? Thanks.
Hmm…weird. Not sure what’s happening to be honest

AbuFadl · December 24, 2018, 1:30pm

Is this needed?

nikhil_no_1 · December 24, 2018, 1:34pm

I have used the IMDB notebook as a reference which has the same sequence. It worked earlier until i tried to set the metric as Fbeta_binary. And for some reason it is not working even after reverting and restarting the kernel.

AbuFadl · December 24, 2018, 2:33pm

That notebook uses the same var for lm and clas learner. Be careful you are not mixing. Maybe clear the models and tmp directories (persist after kernel restart but not after reset all runtimes (on colab, at least).

nikhil_no_1 · December 24, 2018, 3:17pm

Its not anymore. Perhaps that was an issue earlier.
This is my notebook.
This is my first attempt at kaggle competition so just want it to work first (will adhere to the rules of the competition later).
I saw in another notebook that the threshold value is ~0.33. Perhaps everything has worked well and I just need to use a different threshold value when predicting?

IN: learn.predict("Why are men selective?")
OUT: (Category 0, tensor(0), tensor([0.6202, 0.3798]))

GDB · December 27, 2018, 4:26am

I think I’m missing something significant in this thread. Wanted to surface an F1 metric for my NPL binary classifier. Fiddled with Fbeta_binary but it didn’t work on the first try. Was going to invest some real effort figuring out how to implement it in my notebook but, just for chuckles and grins, I decided to try this:

learn = RNN_Learner(md, TextModel(to_gpu(m)), opt_fn=opt_fn)
learn.reg_fn = partial(seq2seq_reg, alpha=2, beta=1)
learn.clip=.25
learn.metrics = [accuracy, f1]

Results:

epoch      trn_loss   val_loss   accuracy   f1
14         0.211136    0.232183   0.912444   0.857092

These results were from a binary classifier I built using the News Headlines Dataset For Sarcasm Detection Kaggle dataset and the fwd_wt103.h5 pre-trained model.

AbuFadl · December 27, 2018, 9:12am

I believe the current fastai (v1.0x) does not have f1, but existed in old fastai.

GDB · December 27, 2018, 4:52pm

Thanks Abu. Seems odd that such a common metric was not carried over from the old version to the latest.

sgugger · December 28, 2018, 9:38am

FYI, SvenBecker added a lot of new metrics, and in passing renamed Fbeta_binary to FBeta (there are more options than just binary), this will be for v1.0.39 and onward.

AbuFadl · December 28, 2018, 3:29pm

Great! So, in next version (1.0.39+), the ‘normal’ F1 = FBeta()?

sgugger · December 28, 2018, 4:51pm

More like FBeta(beta=1) since the default for beta is 2.

AbuFadl · December 29, 2018, 10:26am

I see RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'other' when using FBeta (fastai 1.0.39) on colab gpu. Is this a bug?

SBecker · December 30, 2018, 1:20am

Not sure about this. Did it work with the old Fbeta_binary? Can you link the notebook/code?

Currently the calculation of FBeta as well as some other metrics relies on the computation of the confusion matrix (it makes it easier to perform different averaging approaches). To initialize the matrix, the number of classes has to be specified. Defaults are: FBeta(beta=2, n_classes=2, average="binary") where the average values are inline with the ones used by sklearn (except for ‘samples’).

Ref.: f1_score — scikit-learn 1.5.1 documentation