F1 Score as metric


(Prateek) #1

Hi, I want to use F1 score instead of accuracy as metric in this example https://docs.fast.ai/text.html.

Could someone guide me on this?


(魏璎珞) #2

maybe try f_score = partial(fbeta, thresh=0.2, beta = 1)

so for lesson3-planet notebook it would be

acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2, beta = 1)
learn = create_cnn(data, arch, metrics=[acc_02, f_score])

(Prateek) #3

Thanks @wyquek for your response.

How can I use this f_score with the language_model_learner?


(魏璎珞) #4

hmm…seems like that doesn’t work too well for language_model_learner. I could pass the two metrics into language_model_learner, but I have a bug that tells me my preds and targets are of different sizes:

The size of tensor a (6131) must match the size of tensor b (6080) at non-singleton dimension 1

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai import *
from fastai.text import *

path = untar_data(URLs.IMDB_SAMPLE)
path.ls()

data_lm = TextDataBunch.from_csv(path, 'texts.csv')
data_lm.save()

data_lm = TextLMDataBunch.load(path)
data_lm.show_batch()

acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2, beta = 1)
learn = language_model_learner(data_lm, pretrained_model=URLs.WT103, drop_mult=0.3)

learn.metrics = listify([acc_02,f_score])

learn.lr_find()
learn.recorder.plot(skip_end=15)

learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))

Edit: while debugging, a question surfaced to my consciousness and made me wonder, why would you want F1 stats on a LM?

Sorry Prateek can’t resolve this. Hopefully more skillful users can help you.


(Prateek) #5

@wyquek thank you so much!

I want to do text classification on an imbalanced dataset. Accuracy is not an ideal metric in that case.


(Ethan Sutin) #6

@prateek_joshi @wyquek did you manage to get f score working with text learner?

This should be a built in metric in my opinion.


(魏璎珞) #7

Nay, it turns out passing in fbeta as a metric into language_model_learner is not as easy as it was with create_cnn

seems like accuracy is hardcoded as a metric in RNNLearner

class RNNLearner(Learner):
    "Basic class for a Learner in RNN."
    def __init__(self, data:DataBunch, model:nn.Module, bptt:int=70, split_func:OptSplitFunc=None, clip:float=None,
                 adjust:bool=False, alpha:float=2., beta:float=1., **kwargs):
        super().__init__(data, model, **kwargs)
        self.callbacks.append(RNNTrainer(self, bptt, alpha=alpha, beta=beta, adjust=adjust))
        if clip: self.callback_fns.append(partial(GradientClipping, clip=clip))
        if split_func: self.split(split_func)
        self.metrics = [accuracy] <=== accuracy hardcoded as metrics

I suspect a callback has to be used to hook it in. Below is some hacky codes that can give you a text classifier quickly if you want to try to create a fbeta callback

%reload_ext autoreload
%autoreload 2|
%matplotlib inline

from fastai import *
from fastai.text import *

path = untar_data(URLs.IMDB_SAMPLE)
path.ls()

data_lm = TextDataBunch.from_csv(path, 'texts.csv')

data_lm.save()

data_lm = TextLMDataBunch.load(path)

data_lm.show_batch()

learn = language_model_learner(data_lm, pretrained_model=URLs.WT103, drop_mult=0.3)

learn.lr_find()

learn.recorder.plot(skip_end=15)

learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))

learn.save_encoder('fine_tuned_enc')

data_clas = (TextList.from_csv(path, 'texts.csv', col='text',vocab=data_lm.vocab)
                .random_split_by_pct(0.1) 
                .label_from_df(cols=0)
                .databunch())
data_clas.save('tmp_clas')

data_clas = TextClasDataBunch.load(path, 'tmp_clas', bs=50)
data_clas.show_batch()

acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2, beta = 1)
learn = text_classifier_learner(data_clas, drop_mult=0.5) # callback has to be used to pass in f_score here
learn.load_encoder('fine_tuned_enc')

learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))

(Ethan Sutin) #8

Thanks! I will give it a shot shortly.

Also, I’d like to look into fixing it correctly so that F score is available without hacking through a callback. It seems like it should be, many NLP tasks are often measured by f score.


#9

You can change the metrics at any given time by just typing learn.metrics = new_metrics.


(魏璎珞) #10

tried that before this morning (last evening for you), but it gave error message regarding preds and targets of different sizes for fbeta. For other metrics, even accuracy_threshold, it gave other error messages too ( I forgot what they were, or maybe they were the same. my memory fails me). One metric works, which is error rate, but then error rate is 1- accuracy, so it works cos it leans on the accuracy metric.
I suspect the epoch_on_end thing could be messing with it, but I’m still not familiar with v1, even remotely.


#11

Yes fbeta and accuracy_threshold are intended for multiclassification problems, so targets that are one-hot encoded. You will have to adapt their implementation to your problem.


(Ethan Sutin) #12

Looking for F1 score;

Meant for binary classification and widely used in NLP task evaluation.

For example, the latest Kaggle Quora Insincere question classification problem is scored with F score.

With this be something that could be PRed if I could implement it?

Thanks for All your work and any advice and guidance!


#13

I meant the current implementation in the library are aimed at multiclassification problems (such as planet). Of course you can use it in single classification problems, sorry if I was unclear.

Yes, a PR with an implementation for single classification would be more than welcome


(Rishabh Agarwal Jain) #14

I faced the same problem when using fbeta as a metrics for single classification. I tried to see if I could find the mistake in metrics.py but, it seems I am still new to fastaiv1.

Could someone guide me towards the solution?


(Ethan Sutin) #15

That implementation won’t work because it’s for multi class, I am working on implementing a single class version and will share it once I can get it working.


(魏璎珞) #16

Here’s a somewhat hacky F1 for text binary classification. It ran ok on this test script, but there’s a warning label that reads Not tested rigorously. Use at your own risk.

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai import *
from fastai.text import *

path = untar_data(URLs.IMDB_SAMPLE)
path.ls()

data_lm = TextDataBunch.from_csv(path, 'texts.csv')
data_lm.save()
data_lm = TextLMDataBunch.load(path)
data_lm.show_batch()

learn = language_model_learner(data_lm, pretrained_model=URLs.WT103, drop_mult=0.3)
learn.lr_find()
learn.recorder.plot(skip_end=15)
learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))
learn.save_encoder('fine_tuned_enc')

data_clas = (TextList.from_csv(path, 'texts.csv', col='text',vocab=data_lm.vocab)
                .random_split_by_pct(0.1) 
                .label_from_df(cols=0)
                .databunch())
data_clas.save('tmp_clas')

data_clas = TextClasDataBunch.load(path, 'tmp_clas', bs=50)
data_clas.show_batch()


class fbeta_binary(Callback):
    "Computes the f_beta between preds and targets for binary text classification"

    def __init__(self, beta2 = 1, eps=1e-9,sigmoid = True):      
        self.beta2=beta2**2
        self.eps = eps
        self.sigmoid = sigmoid
    
    def on_epoch_begin(self, **kwargs):
        self.TP = 0
        self.total_y_pred = 0   
        self.total_y_true = 0
    
    def on_batch_end(self, last_output, last_target, **kwargs):
        y_pred = last_output
        y_pred = y_pred.softmax(dim = 1)        
        y_pred = y_pred.argmax(dim=1)
        y_true = last_target.float()
        
        self.TP += ((y_pred==1) * (y_true==1)).float().sum()
        self.total_y_pred += (y_pred==1).float().sum()
        self.total_y_true += (y_true==1).float().sum()
    
    def on_epoch_end(self, **kwargs):
        prec = self.TP/(self.total_y_pred+self.eps)
        rec = self.TP/(self.total_y_true+self.eps)
        res = (prec*rec)/(prec*self.beta2+rec+self.eps)*(1+self.beta2)        
        #self.metric = res.mean()
        self.metric = res     


fbeta_binary = fbeta_binary()  # default is F1

learn = text_classifier_learner(data_clas, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
learn.metrics = [accuracy,fbeta_binary]

learn.fit_one_cycle(2, 1e-2, moms=(0.8,0.7))

#17

Thanks for sharing! Note that you don’t need the line y_pred = y_pred.softmax(dim = 1) since the order of the predictions is going to be the same before and after softmax (so the argmax is the same before). It should make things a little bit faster since a softmax in NLP is usually the slowest layer (depending on your vocab size).

Second little remark is that should name your class with a capital and you could use a @dataclass to replace the init since you’re only passing the arguments to properties of your objects (he **2 in beta2 can be done later in on_epoch_end).

With these little corrections, don’t hesitate to propose a PR to add this to the library in metrics :slight_smile:


(Azarudeen) #18

@sgugger. I tried to use fbeta as metrices for text classification problem.

Below is my code

learn.metrics = [fbeta]

When I run my classification problem using this line

learn.fit_one_cycle(4, moms=moms)

I got this error

The size of tensor a (3) must match the size of tensor b (64) at non-singleton dimension 1

I have 3 classes. My batch size is 64.
Somehow both are related in this. But when I use accuracy it works without any flaws.
Help is appreciated.
Thanks :slight_smile:


#19

You’re using a metric aimed at multi-classification problem on a single-classification problem, so it doesn’t work. You check out the class Fbeta_binary.


(魏璎珞) #20

not super sure about this but should i add a self.clas as described here?