Hi, I want to use F1 score instead of accuracy as metric in this example https://docs.fast.ai/text.html.
Could someone guide me on this?
Hi, I want to use F1 score instead of accuracy as metric in this example https://docs.fast.ai/text.html.
Could someone guide me on this?
maybe try f_score = partial(fbeta, thresh=0.2, beta = 1)
so for lesson3-planet notebook it would be
acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2, beta = 1)
learn = create_cnn(data, arch, metrics=[acc_02, f_score])
hmm…seems like that doesn’t work too well for language_model_learner
. I could pass the two metrics into language_model_learner
, but I have a bug that tells me my preds and targets are of different sizes:
The size of tensor a (6131) must match the size of tensor b (6080) at non-singleton dimension 1
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai import *
from fastai.text import *
path = untar_data(URLs.IMDB_SAMPLE)
path.ls()
data_lm = TextDataBunch.from_csv(path, 'texts.csv')
data_lm.save()
data_lm = TextLMDataBunch.load(path)
data_lm.show_batch()
acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2, beta = 1)
learn = language_model_learner(data_lm, pretrained_model=URLs.WT103, drop_mult=0.3)
learn.metrics = listify([acc_02,f_score])
learn.lr_find()
learn.recorder.plot(skip_end=15)
learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))
Edit: while debugging, a question surfaced to my consciousness and made me wonder, why would you want F1 stats on a LM?
Sorry Prateek can’t resolve this. Hopefully more skillful users can help you.
@wyquek thank you so much!
I want to do text classification on an imbalanced dataset. Accuracy is not an ideal metric in that case.
@prateek_joshi @wyquek did you manage to get f score working with text learner?
This should be a built in metric in my opinion.
Nay, it turns out passing in fbeta
as a metric into language_model_learner
is not as easy as it was with create_cnn
seems like accuracy is hardcoded as a metric in RNNLearner
class RNNLearner(Learner):
"Basic class for a Learner in RNN."
def __init__(self, data:DataBunch, model:nn.Module, bptt:int=70, split_func:OptSplitFunc=None, clip:float=None,
adjust:bool=False, alpha:float=2., beta:float=1., **kwargs):
super().__init__(data, model, **kwargs)
self.callbacks.append(RNNTrainer(self, bptt, alpha=alpha, beta=beta, adjust=adjust))
if clip: self.callback_fns.append(partial(GradientClipping, clip=clip))
if split_func: self.split(split_func)
self.metrics = [accuracy] <=== accuracy hardcoded as metrics
I suspect a callback has to be used to hook it in. Below is some hacky codes that can give you a text classifier quickly if you want to try to create a fbeta callback
%reload_ext autoreload
%autoreload 2|
%matplotlib inline
from fastai import *
from fastai.text import *
path = untar_data(URLs.IMDB_SAMPLE)
path.ls()
data_lm = TextDataBunch.from_csv(path, 'texts.csv')
data_lm.save()
data_lm = TextLMDataBunch.load(path)
data_lm.show_batch()
learn = language_model_learner(data_lm, pretrained_model=URLs.WT103, drop_mult=0.3)
learn.lr_find()
learn.recorder.plot(skip_end=15)
learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))
learn.save_encoder('fine_tuned_enc')
data_clas = (TextList.from_csv(path, 'texts.csv', col='text',vocab=data_lm.vocab)
.random_split_by_pct(0.1)
.label_from_df(cols=0)
.databunch())
data_clas.save('tmp_clas')
data_clas = TextClasDataBunch.load(path, 'tmp_clas', bs=50)
data_clas.show_batch()
acc_02 = partial(accuracy_thresh, thresh=0.2)
f_score = partial(fbeta, thresh=0.2, beta = 1)
learn = text_classifier_learner(data_clas, drop_mult=0.5) # callback has to be used to pass in f_score here
learn.load_encoder('fine_tuned_enc')
learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))
Thanks! I will give it a shot shortly.
Also, I’d like to look into fixing it correctly so that F score is available without hacking through a callback. It seems like it should be, many NLP tasks are often measured by f score.
You can change the metrics at any given time by just typing learn.metrics = new_metrics
.
tried that before this morning (last evening for you), but it gave error message regarding preds and targets of different sizes for fbeta. For other metrics, even accuracy_threshold, it gave other error messages too ( I forgot what they were, or maybe they were the same. my memory fails me). One metric works, which is error rate, but then error rate is 1- accuracy, so it works cos it leans on the accuracy metric.
I suspect the epoch_on_end thing could be messing with it, but I’m still not familiar with v1, even remotely.
Yes fbeta and accuracy_threshold are intended for multiclassification problems, so targets that are one-hot encoded. You will have to adapt their implementation to your problem.
Looking for F1 score;
Meant for binary classification and widely used in NLP task evaluation.
For example, the latest Kaggle Quora Insincere question classification problem is scored with F score.
With this be something that could be PRed if I could implement it?
Thanks for All your work and any advice and guidance!
I meant the current implementation in the library are aimed at multiclassification problems (such as planet). Of course you can use it in single classification problems, sorry if I was unclear.
Yes, a PR with an implementation for single classification would be more than welcome
I faced the same problem when using fbeta as a metrics for single classification. I tried to see if I could find the mistake in metrics.py but, it seems I am still new to fastaiv1.
Could someone guide me towards the solution?
That implementation won’t work because it’s for multi class, I am working on implementing a single class version and will share it once I can get it working.
Here’s a somewhat hacky F1 for text binary classification. It ran ok on this test script, but there’s a warning label that reads Not tested rigorously. Use at your own risk.
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai import *
from fastai.text import *
path = untar_data(URLs.IMDB_SAMPLE)
path.ls()
data_lm = TextDataBunch.from_csv(path, 'texts.csv')
data_lm.save()
data_lm = TextLMDataBunch.load(path)
data_lm.show_batch()
learn = language_model_learner(data_lm, pretrained_model=URLs.WT103, drop_mult=0.3)
learn.lr_find()
learn.recorder.plot(skip_end=15)
learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))
learn.save_encoder('fine_tuned_enc')
data_clas = (TextList.from_csv(path, 'texts.csv', col='text',vocab=data_lm.vocab)
.random_split_by_pct(0.1)
.label_from_df(cols=0)
.databunch())
data_clas.save('tmp_clas')
data_clas = TextClasDataBunch.load(path, 'tmp_clas', bs=50)
data_clas.show_batch()
class fbeta_binary(Callback):
"Computes the f_beta between preds and targets for binary text classification"
def __init__(self, beta2 = 1, eps=1e-9,sigmoid = True):
self.beta2=beta2**2
self.eps = eps
self.sigmoid = sigmoid
def on_epoch_begin(self, **kwargs):
self.TP = 0
self.total_y_pred = 0
self.total_y_true = 0
def on_batch_end(self, last_output, last_target, **kwargs):
y_pred = last_output
y_pred = y_pred.softmax(dim = 1)
y_pred = y_pred.argmax(dim=1)
y_true = last_target.float()
self.TP += ((y_pred==1) * (y_true==1)).float().sum()
self.total_y_pred += (y_pred==1).float().sum()
self.total_y_true += (y_true==1).float().sum()
def on_epoch_end(self, **kwargs):
prec = self.TP/(self.total_y_pred+self.eps)
rec = self.TP/(self.total_y_true+self.eps)
res = (prec*rec)/(prec*self.beta2+rec+self.eps)*(1+self.beta2)
#self.metric = res.mean()
self.metric = res
fbeta_binary = fbeta_binary() # default is F1
learn = text_classifier_learner(data_clas, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
learn.metrics = [accuracy,fbeta_binary]
learn.fit_one_cycle(2, 1e-2, moms=(0.8,0.7))
Thanks for sharing! Note that you don’t need the line y_pred = y_pred.softmax(dim = 1)
since the order of the predictions is going to be the same before and after softmax (so the argmax is the same before). It should make things a little bit faster since a softmax in NLP is usually the slowest layer (depending on your vocab size).
Second little remark is that should name your class with a capital and you could use a @dataclass to replace the init since you’re only passing the arguments to properties of your objects (he **2 in beta2 can be done later in on_epoch_end
).
With these little corrections, don’t hesitate to propose a PR to add this to the library in metrics
@sgugger. I tried to use fbeta as metrices for text classification problem.
Below is my code
learn.metrics = [fbeta]
When I run my classification problem using this line
learn.fit_one_cycle(4, moms=moms)
I got this error
The size of tensor a (3) must match the size of tensor b (64) at non-singleton dimension 1
I have 3 classes. My batch size is 64.
Somehow both are related in this. But when I use accuracy it works without any flaws.
Help is appreciated.
Thanks
You’re using a metric aimed at multi-classification problem on a single-classification problem, so it doesn’t work. You check out the class Fbeta_binary
.