Epochs 0 to 9 are fine, the loss goes down, the accuracy goes up. Both training and validation loss suddenly increase twice on epoch 10 and never recover.
It’s not the first time I see such behavior when training recurrent neural networks. Why does it happen? How to deal with it? Is it possible to tune some hyperparameters to make training more stable?
Thank you, Jeremy. Learning rate was the reason.
I did a few experiments with different learning rates when training the classification (“Sentiment” section of the lesson 4 notebook). I’ve run a training for each LR a few times and checked the mean value of an accuracy for the best epoch in each experiment and whether accuracy drops dramatically at some point of training. Most of the experiments were running for 20 epochs. Here are results:
In most of the experiments I was changing only the 3rd part of LRs, which should regulate training of the last layers.
Looks like 1e-2 is too large. The best for this model seems to be around values 2e-3 and 3e-3. It helps to boost validation accuracy a little bit.
What else can I do to understand the impact of LRs to LSTM training better? Maybe something like visualizing how values of gradients change over time?