How is the threshold set for the FBeta, Precision and Recall classes?

mkbldn · August 26, 2019, 7:37pm

I’ve been working to build a tabular data model for a binary classification and wanted to use recall as one of the metrics to track, however, after looking through the source code I’m unable to understand how these classes are going about setting a probability threshold. They all are inheriting from the ConfusionMatrix() class, so I’m assuming I’m missing something here. I was hoping that someone might be able to provide a little insight into the inner workings of these classes.

Thanks!

nojeffrey · October 16, 2019, 6:35am

Bit late to the party, but having the exact same issue, wanting to adjust the threshold for a binary classifier with precision as the metric in tabular data.

sgugger · October 16, 2019, 1:24pm

Precision and all metrics inheriting from ConfusionMatrix are for single classification problems, so they don’t take a threshold: the predicted class is the one with the highest probability.

nojeffrey · October 17, 2019, 3:51am

Hmm my data set is imbalanced(90%=0, 10%=1) to start with, so it’s predicting the negative class everytime:

2019-10-17-144447_562x417_escrotum

So accuracy is around 90% because it’s predicting the negative class every time, but precision is NaN because it doesn’t have any positive predictions, thus why I want to change this threshold from 50/50 to say 80/20.

4722794 · March 19, 2020, 10:46am

Hi @nojeffrey,

How did you end up solving your problem?

I have an identical (unbalanced dataset) problem as yours.

nojeffrey · March 24, 2020, 10:01pm

Sorry @4722794 didn’t figure it out, I still have the dataset and will figure it out eventually

4722794 · March 25, 2020, 7:06am

Hey @nojeffrey, I managed to handle the imbalance using FocalLoss() as a loss function. The new model significantly improved the precision, recall and f1 score on a comparable test set.
This might help: