Number of epochs to train on

ben.bowles · November 9, 2016, 1:25am

I am not sure about this but I have a suspicion that you have to choose the number of epochs carefully when training models. With a small number of epochs, you may not have trained sufficiently. With a larger number of epochs, it looks like validation accuracy typically starts to go down after some number of epochs. Are my intuitions correct, and if so is it because eventually we start to over fit / memorize the training set?

That means, that, we basically need to watch carefully and choose the number of epochs based on when the validation accuracy starts to go down? Is this just like any other parameter, which has some arbitrary “best” value, i…e, dropoff proportion, etc?

vshets · November 9, 2016, 1:46am

I guess as long as you are not overfitting or reached a set number of epochs or your validation accuracy has not improved after ‘n’ epochs … whichever comes first. In terms of epochs, I have seen a tensorflow example where they say 500 iterations will give you much lower accuracy (depending ofcourse on problem you are solving) and they recommend pushing to 4000! https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#4

jbrown81 · November 9, 2016, 2:09am

Early stopping is a method for avoiding overfitting. In Keras, it’s a callback you can make when fitting a model to stop training once your validation accuracy flattens out, or starts to go back down:
https://keras.io/callbacks/#earlystopping

You might look at the training accuracy vs validation accuracy difference for an ensemble of models to get an idea of the likelihood of overfitting when training for many epochs and use that to decide if a) early stopping is appropriate or b) you first need to regularize your model more (eg more dropout).

jeremy · November 9, 2016, 8:11pm

I generally try to avoid early stopping, but instead use regularization (especially dropout) to allow training for as many epochs as we want.

ben.bowles · November 9, 2016, 8:47pm

Thanks very much. All great suggestions.

js4393 · November 15, 2016, 10:15am

As I train for subsequent epochs, it looks like I’m overfitting more and more – the spread between training and validation accuracy is increasing. However, since I’m working with Redux, I’m really interested in validation loss since that’ll be the measure I evaluate my performance on the test set with.

If the spread between my training and validation accuracy continues to increase (e.g. >1% or >.5%), but my validation loss continues to decrease – am I okay? Or should I stop early/increase my regularization?

jeremy · November 15, 2016, 5:00pm

You are OK! Keep running those epochs! Your goal is to minimize that validation set accuracy.

When you get the best validation result you can, try re-running it with more dropout. That’s the only way you’ll know whether you were over-fitting too much.

bilalkhan · September 5, 2017, 9:53am

If the training accuracy is not decreasing (on average) over a number of epochs (and the training loss is not decreasing), does it still make sense to let the model train for longer? (in this case, I would assume we are underfitting?)

Also, whilst looking at the training progress, is there a way to stop/“interrupt” the training when we see that the current setup is not improving, whilst saving the “current” weights? Or should we just be periodically be saving the weights?

bilalkhan · September 5, 2017, 3:22pm

I just stumbled upon the callbacks section of the Keras documentation:
https://keras.io/callbacks/

There is an interesting option to save best only. This means only the best weights will be preserved. Seems to suit my purposes of letting training progress for few hours and forgetting!

keras.callbacks.ModelCheckpoint(filepath, monitor=‘val_loss’, verbose=0, save_best_only=False, save_weights_only=False, mode=‘auto’, period=1)

tahmidzbr · December 2, 2017, 5:25am

Hello Vshets,

When you said number of iterations = 500, can you please explain? B/c number of iterations to complete 1 epoch is basically = (total no. of images)/(batch_size), so for ex: 1000/64 = 16 iterations

So basically for 500 iterations it would mean roughly 500/16=31 epochs.

Does my analysis make any sense?