Which loss do you look at not decreasing? With overfitting you will have a U shaped loss on your validation set. And train set loss will stop decreasing only when you overfit.
But if you stop there at the bottom of the validation loss valley, this is also not great - you are completely skipping all the goodies that SGDR gives us in terms of finding a nice spot in the weight space.
But the main concern is - if you do early stopping, how do you know it is the best the model could get? It’s sort of like keeping your fingers crossed that somehow training this model of incompatible architecture, if you stop early, you will somehow arrive at the best answer you could on a different but related question that the model was not designed to answer
What @jeremy has for us is 1000x better imho and I never came across this methodology being shared anywhere outside of fastai I hear it is known in deep learning community but literally heard about it first from @jeremy and never heard about it anywhere else.