WHat does it mean when validation loss jumps up and then goes down while metrics improving?

Hi there,

I have the following situation:

epoch train_loss valid_loss accuracy_thresh fbeta fbeta
1 0.132345 0.181402 0.939405 0.547401 0.547401
2 0.132632 0.140455 0.942975 0.573270 0.573270
3 0.125239 0.120305 0.945900 0.630561 0.630561
4 0.116589 0.262259 0.945647 0.651136 0.651136
5 0.113193 0.217006 0.948325 0.658606 0.658606

As you can see, valid_loss jumps up, while accuracy and F score continue to improve.
Too high learning rate? What does it mean conceptually?

Maybe this case even more interesting:

epoch train_loss valid_loss accuracy_thresh fbeta
1 0.134181 0.304799 0.936049 0.527393
2 0.133323 0.130108 0.941813 0.605107
3 0.125137 0.325781 0.941043 0.594625
4 0.118079 0.123688 0.947078 0.655934
5 0.110650 0.111640 0.947906 0.663978

So, calid_loss jumps up and then down to a lower value, but train_loss and accuracy increasing all the time

I think this happens because of the learning rate annealing(learning rate changes each epoch).

During some of the epochs the learning rate is higher and this causes the weights to jump further which can move the weights further away from a better solution instead of closer to it.

1 Like

Thank you for your comment. I remember in part1 2017 (i went though in this September) this annealing was started manually after configurable number of epochs. Now it is changed after each epoch? is it possible to change this behavior, or it is always the best choice?

When you creat a learner you may pass in a learning rate or just use the default. Depending on what exactly you pass in affects whether the lr changes(I think). I don’t remember the specifics but it’s something like can pass in a single float, a slice or a slice range. And depending what you pass in will affect the annealing.

You should note that it’s possible for validation loss to increase after over fitting. The case is when the model is trained too much on the training set and is not able to generalize well on the validation set.

Oh, interesting.
As far as I understood, when I pass slice for the learning rate, it affects learning rate for different layers in the network.
DO you know where I can look at specifics for those (except the code itself)?

Yeah I think you are right. It affects the lr per layer. There is a way to plot the learning rate during training. You can check the docs but they are difficult to read(my opinion). docs.fasi.ai

1 Like