Why do we use accuracy as metric rather than something like f1 score/AUC?

(Aseem Bansal) #1

I have read that accuracy is not a very good metric. The explanations are quite convincing. Even if accuracy is high it is possible to get results that are not what we would expect based on intuitive (unless it is 100% obviously). So why in all of deep learning course do we go with accuracy as the metric? Is it because our data did not have skewed class distributions?


It is not good but it is easy to understand and easy to see how it improves. It is much easier to understand 10% 20% 30% than 2.7, 2.6, 2.5. Also you know about the problems with accuracy as a metric so you know when it should be ignored!

Actually most of the models are using categorical_crossentropy as the loss function and showing accuracy as a secondary metric. Once I have a reasonable model I replace the loss function with a clipped logloss which is what we are actually trying to optimise for kaggle [though I have no idea how much difference this makes in practice].

(Flo) #3

Hi simoneva, could you please elaborate on clipped log loss vs. categorical_crossentropy? What’s the difference?


Logloss is (I think?) the same as categorical crossentropy. As discussed around lesson 1/2 it is necessary to clip the probabilities as logloss overly penalises high/low probabilities that are wrong. I created a bespoke loss function that clipped the probabilities before calculating as it seemed to me sensible to use the same metric for training as for submission. In practice I am not sure if it makes any difference or not.

def do_clip(arr, mx):
    clipped = arr.clip(mx, 1-mx)
    return clipped/clipped.sum(axis=1)[:, np.newaxis]
def make_logloss(clip):
    def logloss(ytrue, ypred):
        return categorical_crossentropy(ytrue, do_clip(ypred, clip))
    return logloss