F1 Score as metric


#21

Didn’t have time to check that issue yet, so I’m not sure.


(Azarudeen) #22

@wyquek I tried to use Fbeta_binary. I got below error.

NameError: name ‘clas’ is not defined


(魏璎珞) #23

try this

@dataclass
class Fbeta_binary(Callback):
    "Computes the fbeta between preds and targets for single-label classification"
    beta2: int = 2
    eps: float = 1e-9
    clas:int=1
    
    def on_epoch_begin(self, **kwargs):
        self.TP = 0
        self.total_y_pred = 0   
        self.total_y_true = 0
    
    def on_batch_end(self, last_output, last_target, **kwargs):
        y_pred = last_output.argmax(dim=1)
        y_true = last_target.float()
        
        self.TP += ((y_pred==self.clas) * (y_true==self.clas)).float().sum()
        self.total_y_pred += (y_pred==self.clas).float().sum()
        self.total_y_true += (y_true==self.clas).float().sum()
    
    def on_epoch_end(self, **kwargs):
        beta2=self.beta2**2
        prec = self.TP/(self.total_y_pred+self.eps)
        rec = self.TP/(self.total_y_true+self.eps)       
        res = (prec*rec)/(prec*beta2+rec+self.eps)*(1+beta2)
        self.metric = res 

If you want F1 for label 1

learn = text_classifier_learner(data_clas, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
learn.metrics=[accuracy, Fbeta_binary(beta2=1,clas = 1)]

OR if you want F1 for label 0

learn = text_classifier_learner(data_clas, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
learn.metrics=[accuracy, Fbeta_binary(beta2=1,clas = 0)]

OR if you want F1 for both label 1 and label 0

learn = text_classifier_learner(data_clas, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
f1_label1 = Fbeta_binary(1,clas = 0)
f1_label0 = Fbeta_binary(1,clas = 1)
learn.metrics=[accuracy, f1_label1,f1_label0]

Here’s a notebook example.

I think there are lots of metrics such as these mentioned in this PR that forummers can help fastai build, but they most probably have to be written as callbacks


(Azarudeen) #24

It worked. Thank you so much :slight_smile:


(Nikhil Utane) #25

Basic question, how does loss minimization happen in case of multiple metrics?
I am working on a text classification problem which is showing high accuracy but not predicting correctly when I am inferring?
I am doubting it is because I am using accuracy (default) as a metric. Now changed to fbeta_binary and checking.


(魏璎珞) #26

My understanding is that minimizing loss will lead to better metric but up to a point, and beyond that the particular metric would start to get worse as the NN overfits. So it’s possible that, for metric F1, the over-fitting would start at, say, epoch 25, while for another metric, accuracy, the overfitting would start at epoch 15. So you can’t train the NN such that both metrics are at their best, unless by coincidence. Most likely you would have to choose one.


(Nikhil Utane) #27

Is it just the no. of epochs and over-fitting? Coz it would then mean the training happens in exactly same way regardless of which metric is used.
My understanding is that if a different metric function is used then calculated loss will be different which would lead to a completely different path during training. Isn’t that the case?

BTW, I used your approach. It works but I don’t see any output. I went back to only having accuracy and even that didn’t display the valid_loss and accuracy values which I get when I don’t set learn.metrics

learn = text_classifier_learner(data_clas, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
f1_label1 = Fbeta_binary(1,clas = 0)
f1_label0 = Fbeta_binary(1,clas = 1)
learn.metrics=[accuracy]
#learn.metrics=[f1_label1,f1_label0]
learn.freeze()

learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7))
Total time: 09:27
epoch	train_loss	valid_loss	accuracy
1	0.145246

Any idea? Thanks.


(魏璎珞) #28

Isn’t that the case?

Yes, that’s what I meant as well.

Any idea? Thanks.
Hmm…weird. Not sure what’s happening to be honest :thinking:


(Abu Fadl) #29

Is this needed?


(Nikhil Utane) #31

I have used the IMDB notebook as a reference which has the same sequence. It worked earlier until i tried to set the metric as Fbeta_binary. And for some reason it is not working even after reverting and restarting the kernel. :frowning:


(Abu Fadl) #32

That notebook uses the same var for lm and clas learner. Be careful you are not mixing. Maybe clear the models and tmp directories (persist after kernel restart but not after reset all runtimes (on colab, at least).


(Nikhil Utane) #33

Its not anymore. Perhaps that was an issue earlier.
This is my notebook.
This is my first attempt at kaggle competition so just want it to work first (will adhere to the rules of the competition later).
I saw in another notebook that the threshold value is ~0.33. Perhaps everything has worked well and I just need to use a different threshold value when predicting?

IN: learn.predict("Why are men selective?")
OUT: (Category 0, tensor(0), tensor([0.6202, 0.3798]))

(Gary Biggs) #34

I think I’m missing something significant in this thread. Wanted to surface an F1 metric for my NPL binary classifier. Fiddled with Fbeta_binary but it didn’t work on the first try. Was going to invest some real effort figuring out how to implement it in my notebook but, just for chuckles and grins, I decided to try this:

learn = RNN_Learner(md, TextModel(to_gpu(m)), opt_fn=opt_fn)
learn.reg_fn = partial(seq2seq_reg, alpha=2, beta=1)
learn.clip=.25
learn.metrics = [accuracy, f1]

Results:

epoch      trn_loss   val_loss   accuracy   f1
14         0.211136    0.232183   0.912444   0.857092

These results were from a binary classifier I built using the News Headlines Dataset For Sarcasm Detection Kaggle dataset and the fwd_wt103.h5 pre-trained model.


(Abu Fadl) #35

I believe the current fastai (v1.0x) does not have f1, but existed in old fastai.


(Gary Biggs) #36

Thanks Abu. Seems odd that such a common metric was not carried over from the old version to the latest.


#37

FYI, SvenBecker added a lot of new metrics, and in passing renamed Fbeta_binary to FBeta (there are more options than just binary), this will be for v1.0.39 and onward.


How do we get f1 scores for our validation set?
(Abu Fadl) #38

Great! So, in next version (1.0.39+), the ‘normal’ F1 = FBeta()?


#39

More like FBeta(beta=1) since the default for beta is 2.


(Abu Fadl) #40

I see RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'other' when using FBeta (fastai 1.0.39) on colab gpu. Is this a bug?


(Sven Becker) #41

Not sure about this. Did it work with the old Fbeta_binary? Can you link the notebook/code?

Currently the calculation of FBeta as well as some other metrics relies on the computation of the confusion matrix (it makes it easier to perform different averaging approaches). To initialize the matrix, the number of classes has to be specified. Defaults are: FBeta(beta=2, n_classes=2, average="binary") where the average values are inline with the ones used by sklearn (except for ‘samples’).

Ref.: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score