[PR discuss] Add Pearson and Spearman corref metric

Richard-Wang · May 23, 2020, 5:05am

I would like ask your opinions on naming before make a pr.

Because scipy.stats.spearmanr or pearsonr return (correlation, p_value), I need AccumMetric to be able to catch only first element of the return.

class MyAccumMetric(AccumMetric):
  def __init__(self, func, get=None, **kwargs):
      super().__init__(func, **kwargs)
      self.get =  None if get is None else partial(get_item, i=get)
  
  @property
  def value(self):
      ....
      out = self.func(targs, preds, **self.kwargs) if self.invert_args else self.func(preds, targs, **self.kwargs)
      return self.get(out) if self.get else out

Is there better naming or way to do this ?

change skm_to_fastai to scim_to_fastai, because we will also use scipy, and add import from scipy import stats as spm
Because we need to pass axis to scipy.stats. spearmanr via scim_to_fastai(skm_to_fastai), so original argument axis of scim_to_fastai should rename to other name such as dim_argmax. (And if adopt this change, we will also need to change other metric specify axis in the use of skm_to_fastai)
add MatthewsCorrCoef which instead of thresh and sigmoid, pass dim_argmax

Three metrics would be like this.

def MatthewsCorrCoef(dim_argmax=-1, sample_weight=None):
    "Matthews correlation coefficient for single-label classification problems"
    return scim_to_fastai(skm.matthews_corrcoef, invert_args=True, dim_argmax=dim_argmax, flatten=True, sample_weight=sample_weight)

def PearsonCorrCoef(dim_argmax=-1):
    "Pearson correlation coefficient for single-label classification problems"
    return scim_to_fastai(spm.pearsonr, invert_args=False, get=0, dim_argmax=dim_argmax, flatten=True,)

def SpearmanCorrCoef(dim_argmax=-1, axis=0, nan_policy='propagate'):
    "Spearman correlation coefficient for single-label classification problems"
    return scim_to_fastai(spm.spearmanr, invert_args=False, get=0, dim_argmax=dim_argmax, flatten=True, axis=axis, nan_policy=nan_policy)

sgugger · May 23, 2020, 1:13pm

The function is already very complicated as it is. Maybe just create a new class and a new function for using scipy metrics is the easiest.

Richard-Wang · May 23, 2020, 1:34pm

Did you mean new not AccuMetric class and new not skm_to_fasai function ?

Because I think scipy and scimitar are only differs return tuple or not, other args and the logic are the same ?

Richard-Wang · May 25, 2020, 1:00am

Hi, @sgugger,

How about we change a way and thus don’t need to modify AccuMetric and just have to rename an argument of skm_to_fastai.

Scipy metrics return tuple (corr, p_value)
What if I do skm_to_fastai(compose(scipy.stats.pearsonr, lambda t: t[0])), a little bit ugly but don’t need to modify AccuMetric.
Naming
def skm_to_fastai(..., axis, ...) -> def scim_to_fastai(..., axis_argmax, ...)

so my SpearmanCorrCoef can pass axis via kwargs of scim_to_fastai to scipy.stats.spearmanr
Both scikit-learn and scipy use this scim_to_fastai