Determining number of epochs with fit_one_cycle

(Mark De Simone) #1

I’m having difficulty understanding what to take away from these results. I did a search for learning rate using different weight decays to determine what lr and what wd to use. I then train for n epochs.
With more epochs accuracy and validation error continue to improve… BUT training loss gets much smaller. This must be overfitting right?, but validation is still improving… What should I take away from this and how should I decide on number of epochs?

Thanks for any help!

With 4 epochs i get:
image
If I recreate the learner again and run for 8 epochs:
image
With 15 epochs:

We can see that training one cycle with a larger number of epochs produces better validation loss and accuracy but that training loss is diminishing toward zero. Validation loss is not trending up but training loss is clearly trending down

0 Likes

(RobG) #2

In my opinion, it doesn’t matter if the training loss is 0.00000 as long as your validation loss / metric is improving. I’d take 0.02/0.07 over 0.07/0.08 any day of the week.

1 Like

(Stefano Giomo) #3

It’s good! Take a look at @balnazzar post on overfitting and Jeremy’s top quote of course one :wink:

So the only thing that tells you that you’re overfitting is that the error rate improves for a while and then starts getting worse again. You will see a lot of people, even people that claim to understand machine learning, tell you that if your training loss is lower than your validation loss, then you are overfitting. As you will learn today in more detail and during the rest of course, that is absolutely not true .

1 Like

(Mark De Simone) #4

Thanks all, particularly that I should not worry about training loss trending even to zero. So I will just focus on validation loss and train for as many epochs as I have time for.

So this begs another question. I train or fine-tune a model with different hyperparameters. Both lead to good and similar accuracy measures but one has much higher validation loss. I’m assuming that the one with the lower validation loss is better, but in what way? Should it be able to generalize better? Both models are just as good at classifying the validation set, just with different losses.

Also how to compare one model say built with resnet34 vs another resnet50. If the accuracies are about the same can we compare the models based on validation loss?

Thanks

0 Likes

(RobG) #5

Ultimately, you measure success by whatever metric is appropriate to your need and seek the best measure against that metric. Doesn’t matter what validation loss is in that case. Having said that, its sometimes difficult to know if the lower val-loss or better-metric for a training set is a better indicator of real world performance. You can always try each of them to find out!

2 Likes