Structured Learner For Kaggle Titanic

Many thanks @stas ! All the new metrics work well :slight_smile:

1 Like

Hello @stas and @sgugger.

I’m using the recall/precison/f1 metrics for a new classification project and noticed stranged values.
I noticed as well that the methods of theses metrics slightly changed since our above discussion (what means the epsilon argument for example ?) in the metrics.py file.

To check with an easy example, I used the metric recall in the lesson1.ipynb notebook of dl1 (see the screenshot). There is a problem I think in the calculation of recall (precision and f1, too).

What do you think ? Thank you.

Ah, the tpos array inside the recall function being converted to a bytes tensor, tpos.sum(), doesn’t give the good result. I’m thinking it should be tpos.long().sum(), can you try it and see if you get the good result?

The epsilon is there to avoid a division by zero, because you never know.

Thanks @sgugger for your answer but tpos.long().sum() still does not give the right answer (no change in the wrong value returned by recall).

Oh, found the issue. You can’t use recall as a metric because metrics are computed as the averages of the metrics per batch. It works for accuracy of for the loss because those are averages, but recall isn’t one. To compute the recall, you should do

preds,targs = learn.predict_with_targs()
recall(T(preds), T(targs))

(note that recall_np seems very bugged so you shouldn’t use that as well).

If you’re using pytorch >= 0.4, you’ll need to replace the last line of the recall function by

tpos.sum().item()/(targs.sum().item() + epsilon)

Thanks sylvain but the fact that recall (and precision and f1) is not defined as a training metric (in the way of accuracy which is published at the end of each cycle) is a problem for me.

I have an imbalanced data and accuracy at the end of each cycle does not help a lot. I need to get the recall metric after each cycle, not the accuracy (same idea for precision and f1).

Thanks to your code, I can get the recall value on validation set after each cycle using the following new recall method :

def recall_new(a,b):
    log_preds,targs = learn.predict_with_targs()
    return recall(T(log_preds), T(targs), thresh=0.5, epsilon=1e-8)

learn.fit(lr, n_cycle, metrics=[recall_new])

Note : the method recall_new needs 2 arguments but they are not used.

This is not an optimal solution and the training takes more time as the predictions are calculated for each batch of the validation set (we need in fact the recall on validation after each cycle, not after each batch) but it works. Clearly, it would be great to find a better solution. What do you think ? Thanks.

I can’t do much about the fact that current fastai assumes all metrics implemented are an average and recall/precision/f1 aren’t averages.
You would be better off using a Callback to do the same thing as recall_new at on_epoch_end, since as you note, you’re currently computing your new_recall as many times as there are batches in the validation set.

I understand but I guess there is something to change in the file metrics.py as the uses of the metric functions are not equivalent. We can use accuracy in m.fit(..., metrics=[accuracy]), but not recall for example.

Yes, you’re right. First, I have to study how to use Callback :slight_smile: to get (the right) recall printed after each cycle as it is for train_loss, val_loss or accuracy when listed in m.fit(..., metrics=[accuracy]).
If you or anyone else can help me save some time with the corresponding code, I would appreciate it :slight_smile: Thanks.

I was a bit bothered by the recall at 0.5 when it should be at least close to the real value (it can’t be the exact value as I explained before but it shouldn’t be so far from it).
Turns out that when there is no positive at all, the metrics returns 0. which messes up the average since half the batches contain only cats (they are taken in order on the validation set). Replacing the last line of recall by

return tpos.sum()/targs.sum() if targs.sum() != 0 else 1.

(don’t forget to add .item() like before if you have pytorch > 0.4) will correct that bug. The recall showed will be an approximate of the true recall (again not the exact one, but it’ll look close enough).
Then you can compute the true recall at the end of your training.

Here is my kaggle kernel using the deep learning approach from lesson3/4 and some feature engineering
https://www.kaggle.com/rhadow/titanic

This got a score of 0.79425