# Validation Loss VS Accuracy

I thought validation loss has a direct relationship with accuracy, means always lower validation loss causes higher accuracy, but while training a model, I faced this:

How is it possible? Why do we have lower validation loss but also lower accuracy?

5 Likes

It relates to the loss function. If we use mean square error (MSE) as a loss we optimize by reducing the [average squared] distance between our predictions and the true values - not by minimising miss-classification (Iām assuming this is classification). You may get intuition about this from drawing decision boundaries between classes in something like the iris data set (http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html). You may also see how you could move decision boundaries and still have the same accuracy but a wider margin between classes (how loss can improve and accuracy stay the same) - svm and boosting examples often show max margin. Playing with logistic regression with and without outliers in 2D may also help.

Jeremy mentioned an F2 loss function metric (I think thatās what was used - it was related to minimising false positives) - thereās a set of loss functions metrics around those lines that focus on classification accuracy measures.

 - was looking for a better visualization - didnāt find that, but this post speaks to the topic a bit (https://machinelearningmastery.com/classification-accuracy-is-not-enough-more-performance-measures-you-can-use/). Looking into the loss functions also gives some intuition about why we typically want relatively balanced class examples in training.

[edit 2] - re: Jeremyās reply

3 Likes

Nice explanation. One nit:

No thatās a metric, not a loss function. We use cross entropy loss.

1 Like

So should we be happy if the accuracy goes up if the loss also rises?

``````epoch      trn_loss   val_loss   accuracy
0      0.712461   1.174837   0.692503
1      0.606297   1.178657   0.694088
2      0.528136   1.249662   0.700428
``````

Empirically, accuracy seems like quite a limited measure of quality of predictions. To predict whether an example belongs to some class, our model outputs a number (whatever we put through sigmoid or softmax) between 0 and 1.

To calculate accuracy, we take some arbitrary threshold (0.5 by default) and every prediction above means examples belong to some class and below they donāt. This threshold of 0.5 gets dicey really fast if we donāt have perfectly balanced class (50% positive and 50% negative examples) or if we have multiple classes.

What happens when we have 90% of negative examples and 10% of positive examples? Is 91% accuracy good or bad?

The best interpretation of accuracy goes up and loss goes up imho is: āour model is becoming better on doing well on accuracy with whatever threshold we setā.

There are other metrics that take performance of our classifier at different thresholds into consideration, for example area under ROC curve or mean average precision

Validation loss is nice as in some sense it is some measure of how much our predictions differ from what they should be before we put them through the threshold.

2 Likes

Iām getting a really weird sequence of numbers for validation loss vs. accuracy. While loss on train is getting smaller and smaller, loss of validation is fluctuating a lot. At the same time the quantity reported as āaccuracyā (which I still donāt know what it is), fluctuates in a small range. Iām training on a set of news articles and below are both the output of fastai builtin function and the output of performing the final classification on train, validation and test datasets calculated by the well-known scikitlearn classification_report function. As you see the accuracy at the very final epoch of training has been reported 0.567500, however, I really get good train,validation and test precision and recall on these datasets as well as the test set.

``````Building the text classifier
cycles with big learning rate
epoch  train_loss  valid_loss  accuracy
1      0.601414    1.372827    0.560000
epoch  train_loss  valid_loss  accuracy
1      0.572330    2.343134    0.550000
epoch  train_loss  valid_loss  accuracy
1      0.625675    15.646263   0.587500
cycles with mid learning rate
epoch  train_loss  valid_loss  accuracy
1      0.565834    60.196205   0.600000
epoch  train_loss  valid_loss  accuracy
1      0.548722    58.771461   0.620000
epoch  train_loss  valid_loss  accuracy
1      0.551044    79.224930   0.583750
freeze to -2
epoch  train_loss  valid_loss  accuracy
1      0.542036    113.556290  0.587500
epoch  train_loss  valid_loss  accuracy
1      0.496392    78.955574   0.623750
epoch  train_loss  valid_loss  accuracy
1      0.469386    111.237091  0.611250
unfreeze and sliced learning rate
epoch  train_loss  valid_loss  accuracy
1      0.427161    86.719170   0.610000
epoch  train_loss  valid_loss  accuracy
1      0.435571    162.631317  0.628750
epoch  train_loss  valid_loss  accuracy
1      0.388422    103.925232  0.631250
freeze and final cycles for fine-tuning
epoch  train_loss  valid_loss  accuracy
1      0.390428    154.438812  0.610000
2      0.400485    4.032323    0.663750
3      0.406729    55.436405   0.607500
4      0.391934    12.530686   0.576250
5      0.337642    72.138474   0.606250
6      0.363133    50.401752   0.611250
7      0.400870    1.035045    0.577500
8      0.386779    17.284279   0.607500
9      0.404209    1.226067    0.648750
10     0.373572    8.231665    0.567500
saving the classifier...

Results on training data:
precision    recall  f1-score   support

0       0.88      0.92      0.90      1200
1       0.92      0.88      0.90      1200

micro avg       0.90      0.90      0.90      2400
macro avg       0.90      0.90      0.90      2400
weighted avg       0.90      0.90      0.90      2400

Results on validation data:
precision    recall  f1-score   support

0       0.80      0.82      0.81       400
1       0.82      0.80      0.81       400

micro avg       0.81      0.81      0.81       800
macro avg       0.81      0.81      0.81       800
weighted avg       0.81      0.81      0.81       800

Results on test data:
precision    recall  f1-score   support

0       0.78      0.84      0.81       400
1       0.82      0.77      0.79       400

micro avg       0.80      0.80      0.80       800
macro avg       0.80      0.80      0.80       800
weighted avg       0.80      0.80      0.80       800``````
1 Like