Ways to ensure convergence

How does one ensure that convergence is achieved and be ascertain of the fact that no training would be beneficial? Sometimes loss stops improving for a while, but in some cases, due to too many bumps (also described with loss landscape), can give many fluctuations. Due that accuracy varies as well. What should be the optimal stopping point? Can you please share your experiences, papers regarding this?

Take a look at this paper from Leslie Smith: https://arxiv.org/abs/1803.09820
He explains how to spot if your model is underfitting/overfitting and how to set some common hyperparameters efficiently.

Basically, as long as your validation loss decreases, you can continue the training because it means your network is still learning and is getting better (or more confident) on your validation set.

If you have too many fluctuations, try to reduce your learning rate or maybe increase regularization (weight decay, dropout,…)