Discrepancy with proba-based metrics between fastai2 and sklearn

I don’t know if it’ relevant but there was a working multi class version. Fastai v2 vision

Hi @sgugger,

@FraPochetti and I have been working together this morning to review the proba-based metrics issue in fastai2 (RocAuc and APScore), and have jointly come up with a proposal we’d like to submit to you.
It manages all possibilities sklearn allows while keeping the API consistent with the rest of fastai2 metrics.
We have tested our proposal vs sklearn’s API using this gist and everything works well.

In sklearn there are 3 scenarios for roc_auc_score (each of them calculated slightly differently):

  • Binary:

    • targets: shape = (n_samples, )
    • preds: pass through softmax and then [:, -1], shape = (n_samples,)
  • Multiclass:

    • targets: shape = (n_samples, )
    • preds: pass through softmax, shape = (n_samples, n_classes)
    • multi_class = ‘ovr’ or ‘ovo’ (1)
  • Multilabel:

    • targets: shape = (n_samples, n_classes)
    • preds: pass through sigmoid, shape = (n_samples, n_classes)

(1) ‘ovr’: average AUC of each class against the rest . 'ovo’ : average AUC of all possible pairwise combinations of classes.

sklearn’s average_precision_score implementation is restricted to binary or multilabel classification tasks. So it cannot be used in multiclass cases.

Here’s our proposal:

class AccumMetric(Metric):
    "Stores predictions and targets on CPU in accumulate to perform final calculations with `func`."
    def __init__(self, func, dim_argmax=None, sigmoid=False, softmax=False, proba=False, thresh=None, to_np=False, invert_arg=False,
                 flatten=True, **kwargs):
        store_attr(self,'func,dim_argmax,sigmoid,softmax,proba,thresh,flatten')
        self.to_np,self.invert_args,self.kwargs = to_np,invert_arg,kwargs

    def reset(self): self.targs,self.preds = [],[]

    def accumulate(self, learn):
        pred = learn.pred.argmax(dim=self.dim_argmax) if (self.dim_argmax and not self.proba) else learn.pred
        if self.sigmoid: pred = torch.sigmoid(pred)
        if self.thresh:  pred = (pred >= self.thresh)
        if self.softmax: 
            pred = F.softmax(pred, dim=-1)
            if learn.dls.c == 2: pred = pred[:, -1]
        targ = learn.y
        pred,targ = to_detach(pred),to_detach(targ)
        if self.flatten: pred,targ = flatten_check(pred,targ)
        self.preds.append(pred)
        self.targs.append(targ)

    @property
    def value(self):
        if len(self.preds) == 0: return
        preds,targs = torch.cat(self.preds),torch.cat(self.targs)
        if self.to_np: preds,targs = preds.numpy(),targs.numpy()
        return self.func(targs, preds, **self.kwargs) if self.invert_args else self.func(preds, targs, **self.kwargs)

    @property
    def name(self):  return self.func.func.__name__ if hasattr(self.func, 'func') else  self.func.__name__

def skm_to_fastai(func, is_class=True, thresh=None, axis=-1, sigmoid=None, softmax=False, proba=False, **kwargs):
    "Convert `func` from sklearn.metrics to a fastai metric"
    dim_argmax = axis if is_class and thresh is None else None
    sigmoid = sigmoid if sigmoid is not None else (is_class and thresh is not None)
    return AccumMetric(func, dim_argmax=dim_argmax, sigmoid=sigmoid, softmax=softmax, proba=proba, thresh=thresh,
                       to_np=True, invert_arg=True, **kwargs)

def APScore(axis=-1, average='macro', pos_label=1, sample_weight=None):
    "Average Precision for binary single-label classification problems"
    return skm_to_fastai(skm.average_precision_score, axis=axis, flatten=False, softmax=True, proba=True,
                         average=average, pos_label=pos_label, sample_weight=sample_weight)
    
def APScoreMulti(axis=-1, average='macro', pos_label=1, sample_weight=None):
    "Average Precision for multi-label classification problems"
    return skm_to_fastai(skm.average_precision_score, axis=axis, flatten=False, sigmoid=True, proba=True,
                         average=average, pos_label=pos_label, sample_weight=sample_weight)
    
def RocAuc(axis=-1, average='macro', sample_weight=None, max_fpr=None, multi_class='raise', labels=None):
    "Area Under the Receiver Operating Characteristic Curve for single-label classification problems"
    """use default multi_class ('raise') for binary-class, and 'ovr'(average AUC of each class against the rest) 
    or 'ovo' (average AUC of all possible pairwise combinations of classes) for multi-class tasks"""
    return skm_to_fastai(skm.roc_auc_score, axis=axis, flatten=False, softmax=True, proba=True,
                         average=average, sample_weight=sample_weight, max_fpr=max_fpr, multi_class=multi_class, labels=labels)
    
def RocAucMulti(axis=-1, average='macro', sample_weight=None, max_fpr=None):
    "Area Under the Receiver Operating Characteristic Curve for multi-label classification problems"
    return skm_to_fastai(skm.roc_auc_score, axis=axis, flatten=False, sigmoid=True, proba=True,
                         average=average, sample_weight=sample_weight, max_fpr=max_fpr)

Please, let us know if we can help you in any way with this.

3 Likes

This introduces a bit too much magic. I think there should be two names: BinaryRocAuc and RocAuc for the two separate metrics (that handle things differently).

Hi @sgugger,

Yes, @FraPochetti and I also discussed how the different cases should be grouped and named.

If we understand you correctly, you are proposing to split RocAuc into 2 to avoid the multi_class kwarg. That makes sense.

This would be our proposal for the 3 scenarios (gist with full code):

def RocAuc(axis=-1, average='macro', sample_weight=None, max_fpr=None):
    "Area Under the Receiver Operating Characteristic Curve for single-label binary classification problems"
    return skm_to_fastai(skm.roc_auc_score, axis=axis, flatten=False, softmax=True, proba=True,
                         average=average, sample_weight=sample_weight, max_fpr=max_fpr)

def RocAucMultiClass(axis=-1, average='macro', sample_weight=None, max_fpr=None, multi_class='ovr', labels=None):
    "Area Under the Receiver Operating Characteristic Curve for single-label multi-class classification problems"
    return skm_to_fastai(skm.roc_auc_score, axis=axis, flatten=False, softmax=True, proba=True,
                         average=average, sample_weight=sample_weight, max_fpr=max_fpr, multi_class=multi_class, labels=labels)
    
def RocAucMulti(axis=-1, average='macro', sample_weight=None, max_fpr=None):
    "Area Under the Receiver Operating Characteristic Curve for multi-label classification problems"
    return skm_to_fastai(skm.roc_auc_score, axis=axis, flatten=False, sigmoid=True, proba=True,
                         average=average, sample_weight=sample_weight, max_fpr=max_fpr)

As to the names we have a several options:

  • binary case: RocAuc or RocAucBinary and APScore
  • multi-class case: RocAucMultiClass (avg precision ot available in sklearn)
  • multi-label case: RocAucMulti or RocAucMultiLabel, and APScoreMulti

We believe RocAuc and RocAucMulti are consistent with all other fastai2 metrics. The new one would be RocAucMultiClass as multiclass in rocauc requires a different behavior.

1 Like

I disagree with the multi-class terminology. All metrics for single-label work with any number of labels, so the base RocAuc/APScore should work for the multi-label case. Since the binary case requires special behavior, it should be BinaryRocAuc and BinaryAPScore.

I think you meant:
"All metrics for single-label work with any number of classes, so the base RocAuc / APScore should work for the multi-class case.”
Right?

If so, it makes sense.
May I suggest just one thing. Can we use Binary as suffix instead of prefix? It’s easier to find the different RocAuc types when you start typing it using code completion?

This way it’d be:

  • RocAuc: for single-label multi-class
  • RocAucBinary or BinaryRocAuc/ APScoreBinary or BinaryAPScore: for single-label binary
  • RocAucMulti/ APSMulti: for multi-label

But it’s your call.

2 Likes

Yes I wanted to say multi-class, sorry.
No problem with having Binary as a suffix (since Multi is also a suffix).

2 Likes

Ok, good. So we agreed :sweat_smile:.

Here’s a gist with the code and the tests we used.

Here’s the code with agreed naming:

class AccumMetric(Metric):
    "Stores predictions and targets on CPU in accumulate to perform final calculations with `func`."
    def __init__(self, func, dim_argmax=None, sigmoid=False, softmax=False, proba=False, thresh=None, to_np=False, invert_arg=False,
                 flatten=True, **kwargs):
        store_attr(self,'func,dim_argmax,sigmoid,softmax,proba,thresh,flatten')
        self.to_np,self.invert_args,self.kwargs = to_np,invert_arg,kwargs

    def reset(self): self.targs,self.preds = [],[]

    def accumulate(self, learn):
        pred = learn.pred.argmax(dim=self.dim_argmax) if (self.dim_argmax and not self.proba) else learn.pred
        if self.sigmoid: pred = torch.sigmoid(pred)
        if self.thresh:  pred = (pred >= self.thresh)
        if self.softmax: 
            pred = F.softmax(pred, dim=-1)
            if learn.dls.c == 2: pred = pred[:, -1]
        targ = learn.y
        pred,targ = to_detach(pred),to_detach(targ)
        if self.flatten: pred,targ = flatten_check(pred,targ)
        self.preds.append(pred)
        self.targs.append(targ)

    @property
    def value(self):
        if len(self.preds) == 0: return
        preds,targs = torch.cat(self.preds),torch.cat(self.targs)
        if self.to_np: preds,targs = preds.numpy(),targs.numpy()
        return self.func(targs, preds, **self.kwargs) if self.invert_args else self.func(preds, targs, **self.kwargs)

    @property
    def name(self):  return self.func.func.__name__ if hasattr(self.func, 'func') else  self.func.__name__

def skm_to_fastai(func, is_class=True, thresh=None, axis=-1, sigmoid=None, softmax=False, proba=False, **kwargs):
    "Convert `func` from sklearn.metrics to a fastai metric"
    dim_argmax = axis if is_class and thresh is None else None
    sigmoid = sigmoid if sigmoid is not None else (is_class and thresh is not None)
    return AccumMetric(func, dim_argmax=dim_argmax, sigmoid=sigmoid, softmax=softmax, proba=proba, thresh=thresh,
                       to_np=True, invert_arg=True, **kwargs)

def APScore(axis=-1, average='macro', pos_label=1, sample_weight=None):
    "Average Precision for binary single-label classification problems"
    return skm_to_fastai(skm.average_precision_score, axis=axis, flatten=False, softmax=True, proba=True,
                         average=average, pos_label=pos_label, sample_weight=sample_weight)
    
def APScoreMulti(axis=-1, average='macro', pos_label=1, sample_weight=None):
    "Average Precision for multi-label classification problems"
    return skm_to_fastai(skm.average_precision_score, axis=axis, flatten=False, sigmoid=True, proba=True,
                         average=average, pos_label=pos_label, sample_weight=sample_weight)
    
def RocAucBinary(axis=-1, average='macro', sample_weight=None, max_fpr=None):
    "Area Under the Receiver Operating Characteristic Curve for single-label binary classification problems"
    return skm_to_fastai(skm.roc_auc_score, axis=axis, flatten=False, softmax=True, proba=True,
                         average=average, sample_weight=sample_weight, max_fpr=max_fpr)

def RocAuc(axis=-1, average='macro', sample_weight=None, max_fpr=None, multi_class='ovr', labels=None):
    "Area Under the Receiver Operating Characteristic Curve for single-label multi-class classification problems"
    return skm_to_fastai(skm.roc_auc_score, axis=axis, flatten=False, softmax=True, proba=True,
                         average=average, sample_weight=sample_weight, max_fpr=max_fpr, multi_class=multi_class, labels=labels)
    
def RocAucMulti(axis=-1, average='macro', sample_weight=None, max_fpr=None):
    "Area Under the Receiver Operating Characteristic Curve for multi-label classification problems"
    return skm_to_fastai(skm.roc_auc_score, axis=axis, flatten=False, sigmoid=True, proba=True,
                         average=average, sample_weight=sample_weight, max_fpr=max_fpr)

Will you add update this in fastai2 then? Is there anything else you need from @FraPochetti or me?

1 Like

I’ve made a tentative update. Let me know if you get any problem with it.

1 Like

Great!
I’ll test it right away, and will get back to you.

Ok, I’ve just finished testing. And have found a few (easy to solve) issues.

  • Binary: APScoreBinary and RocAucBinary both work as expected.

  • Multi-class: RocAuc works well too. But:
    * labels=None as a kwarg is missing
    * there’s a typo in the description :
    It says: "Area Under the Receiver Operating Characteristic Curve for single-label multi-label classification problems”
    when it should be "Area Under the Receiver Operating Characteristic Curve for single-label multi-class classification problems”

  • Multi-label is not working well because a thresh=0.5 has been added. But these are proba-based metrics that don’t require a thresh.

I’ve removed thresh and now they work well.

So they should be:

def RocAuc(axis=-1, average='macro', sample_weight=None, max_fpr=None, multi_class='ovr', labels=None):
    "Area Under the Receiver Operating Characteristic Curve for single-label multi-class classification problems"
    assert multi_class in ['ovr', 'ovo']
    return skm_to_fastai(skm.roc_auc_score, axis=axis, activation=ActivationType.Softmax, flatten=False, average=average, sample_weight=sample_weight, max_fpr=max_fpr, multi_class=multi_class, labels=labels)


def APScoreMulti(sigmoid=True, average='macro', pos_label=1, sample_weight=None):
    "Average Precision for multi-label classification problems"
    activation = ActivationType.Sigmoid if sigmoid else ActivationType.No
    return skm_to_fastai(skm.average_precision_score, activation=activation, flatten=False,
                         average=average, pos_label=pos_label, sample_weight=sample_weight)


def RocAucMulti(sigmoid=True, average='macro', sample_weight=None, max_fpr=None):
    "Area Under the Receiver Operating Characteristic Curve for multi-label binary classification problems"
    activation = ActivationType.Sigmoid if sigmoid else ActivationType.No
    return skm_to_fastai(skm.roc_auc_score, activation=activation, flatten=False,
                         average=average, sample_weight=sample_weight, max_fpr=max_fpr)
3 Likes

Thanks for investigating all of this. I removed the thresh and fixed the typo.

Great!
I’ve retested again and everything works smoothly now :ok_hand:
So from my side we can close this.
THANKS a lot @FraPochetti and @sgugger for your work to fix this issue. It’s been a pleasure working with you!

4 Likes

If you have class = {0,1} and you want to use RocAUc
class{0,1} are complementary like cats & dogs.

learn = cnn_learner(dls, resnet34, metrics=[accuracy])
learn.fine_tune(1)

What’s the best way to invoke it?
I don’t see examples here

https://dev.fast.ai/metrics#RocAuc

Hi Gerardo,
Sorry for the late reply, but I was out last week.

  1. You should select the appropriate metric:
    • RocAucBinary: for single-label binary
    • RocAuc: for single-label multi-class
    • RocAucMulti/ APSMulti: for multi-label
  2. In your case (binary classification):
    learn = cnn_learner(dls, resnet34, metrics=[accuracy, RocAucBinary()])
1 Like

@oguiza You are always super helpful :100:

What is the purpose of the axis?

RocAucBinary ( axis = -1 , average = 'macro' , sample_weight = None , max_fpr = None , multi_class = 'raise' )

Thanks @gerardo.
axis is just a value used by skm_to_fastai, but AFAIK it doesn’t need to be changed for any of the RocAuc variants.

I’m using RocAuc() for the US Income dataset something like this

    to = TabularPandas(df, procs=[Categorify, FillMissing,Normalize],
                        cat_names = ['workclass', 'education', 'marital.status', 'occupation', 'relationship', 'race','sex','native.country'],
                       cont_names = ["fnlwgt","capital.gain","capital.loss","hours.per.week","age"],
                       y_names='income',
                       splits=splits)
dls = to.dataloaders(bs=64)
learn = tabular_learner(dls, metrics=[RocAuc()])

It’s giving me an error like this

AttributeError                            Traceback (most recent call last)
<ipython-input-59-8fd4c17131d7> in <module>
----> 1 learn = tabular_learner(dls, metrics=[RocAuc()])

<ipython-input-36-803d668dbfd6> in RocAuc(axis, average, sample_weight, max_fpr, multi_class, labels)
     56     "Area Under the Receiver Operating Characteristic Curve for single-label multi-class classification problems"
     57     return skm_to_fastai(skm.roc_auc_score, axis=axis, flatten=False, softmax=True, proba=True,
---> 58                          average=average, sample_weight=sample_weight, max_fpr=max_fpr, multi_class=multi_class, labels=labels)
     59 
     60 def RocAucMulti(axis=-1, average='macro', sample_weight=None, max_fpr=None):

<ipython-input-36-803d668dbfd6> in skm_to_fastai(func, is_class, thresh, axis, sigmoid, softmax, proba, **kwargs)
     36     sigmoid = sigmoid if sigmoid is not None else (is_class and thresh is not None)
     37     return AccumMetric(func, dim_argmax=dim_argmax, sigmoid=sigmoid, softmax=softmax, proba=proba, thresh=thresh,
---> 38                        to_np=True, invert_arg=True, **kwargs)
     39 
     40 def APScore(axis=-1, average='macro', pos_label=1, sample_weight=None):

<ipython-input-36-803d668dbfd6> in __init__(self, func, dim_argmax, sigmoid, softmax, proba, thresh, to_np, invert_arg, flatten, **kwargs)
      3     def __init__(self, func, dim_argmax=None, sigmoid=False, softmax=False, proba=False, thresh=None, to_np=False, invert_arg=False,
      4                  flatten=True, **kwargs):
----> 5         store_attr(self,'func,dim_argmax,sigmoid,softmax,proba,thresh,flatten')
      6         self.to_np,self.invert_args,self.kwargs = to_np,invert_arg,kwargs
      7 

/opt/conda/lib/python3.7/site-packages/fastcore/basics.py in store_attr(names, self, but, cast, **attrs)
    275     if self: args = ('self', *args)
    276     else: self = fr.f_locals[args[0]]
--> 277     if not hasattr(self, '__stored_args__'): self.__stored_args__ = {}
    278     anno = annotations(self) if cast else {}
    279     if not attrs:

AttributeError: 'str' object has no attribute '__stored_args__'

Can someone please guide me where I’m going wrong??

The answer is explained above.

  • RocAuc: for single-label multi-class
  • RocAucBinary or BinaryRocAuc/ APScoreBinary or BinaryAPScore: for single-label binary
  • RocAucMulti/ APSMulti: for multi-label

you may try the correct one like RocAucMulti and see the docs.

1 Like

Thank you so much dear oguiza
When I use:
def RocAuc(axis=-1, average=‘macro’, sample_weight=None, max_fpr=None, multi_class=‘ovr’, labels=None):
“Area Under the Receiver Operating Characteristic Curve for single-label multi-class classification problems”
assert multi_class in [‘ovr’, ‘ovo’]
return skm_to_fastai(skm.roc_auc_score, axis=axis, activation=ActivationType.Softmax, flatten=False, average=average, sample_weight=sample_weight, max_fpr=max_fpr, multi_class=multi_class, labels=labels)


The error: "NameError: name 'ActivationType' is not defined" appeared,
What should I add to my model?
thanks