I’m currently building sequence models for forecasting, and have tried using RNNs, LSTMs, and GRUs.
Something unusual I noticed was the highly unstable loss curves, where the loss sometimes goes back to the loss level in the first few epochs. Interesting, the severity of this decreases from RNNs to LSTMs to GRUs.
Would anyone have an idea why this occurs?
For reference, here are the loss curves for the following models, across 500 epochs.