Column-wise AUC loss function

I was wondering if anyone had tips on how to integrate the following defined loss function in the fastai framework language model “mean column-wise ROC AUC”. Should we use something like sklearn’s http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score and then for Y andY_true pass 2 matrices containing all the columns for which the above loss function applies or a different approach is needed? Seems this function supports only 2 vectors, not 2 matrices, but we could do it per column and then avg?

The one you link to works with multiple labels. I’ll copy in how I use it with fastai lib when I get to my computer later today.

You need at least one positive example per class for it to work so you might need a large batch size for the validation set.

This is what I pass in metrics array to fit:

 def roc_auc(preds, y):                                                                                                                                                                                         
  1     return metrics.roc_auc_score(y.data.cpu().numpy(), np.exp(preds.cpu().numpy())) 

(metrics is just from sklearn import metrics)

1 Like

Got it. Thanks a lot!

1 Like

This is the final code I have since it seems that fastai doesn’t automatically one hot encode the Ys:

def roc_auc(preds, y):
  preds = np.exp(preds) #conv from logs
  exp = V(y).data.cpu().numpy() #predicted category ID
  #batch X num_classes
  bs = preds.shape[0] #batch size
  nclass = preds.shape[1] #size to determine length of Y one hot encoding
  y = np.zeros((bs, nclass))
  y[np.arange(bs), exp] = 1 #one hot encode Ys
  return metrics.roc_auc_score(y, preds, average="micro")

Why do you need to call .cpu().numpy() on preds?

Maybe pytorch change quite a bit since last time I used it, but since sklearn metrics are a python call that expects a numpy array as parameter, you need to get the GPU variable and convert it to that, so it’s usable in python code that runs on the CPU. Not sure if pytorch does that transparently these days.

np.exp is not needed, it makes no difference to rank ordering.

I am trying to use the roc_auc function for the metrics parameter, but I keep running into the following error. I am pretty new to fastai, and was wondering what was wrong with my code.

def roc_auc(preds, y):
  return roc_auc_score(y.data.cpu().numpy(), preds.cpu().numpy())

dep_var = 'Y'
valid_idx = range(len(train_df)-2000, len(train_df))
procs = [Categorify, Normalize]
fast_data = TabularDataBunch.from_df(current_dir, train_df.drop('Id',axis=1), dep_var, valid_idx=valid_idx, procs=procs, cat_names=cat_list)

learn = tabular_learner(fast_data, layers=[200,100], metrics=[roc_auc])
learn.fit_one_cycle(1, 1e-2)

ValueError: bad input shape (64, 2)