04_mnist_basics : wildly slower learning speed in SGD

alanfranz · January 2, 2021, 6:25pm

Hello,
I’m going through the fastai book. I’ve setup a VM on Azure by following the official instructions ( https://course.fast.ai/start_azure_dsvm ) and install script. I’ve found a strange behaviour in the 04_mnist_basics notebook. From the saved notebook, in the end-to-end SGD example:

But, if I just restart and re-run the kernel:

It seems that the model is way slower at learning - something which is a bit expected, because the gradients that I see are much smaller than the ones that can be seen in the original notebook.

Of course, the initial parameters are random, so this COULD happen, but I was convinced that some setup part in fastai would set a seed, because the results are very repeatable on my system.

Any clue?

CyberPsyLen · February 3, 2021, 11:03am

I find similar results on Colab, not sure why

jimmiemunyi · February 5, 2021, 6:53am

I experienced the same issue recently but not when I first ran the notebook some months ago. I don’t know what changed in those months but my intuition to the current problem is, the weights are still getting updated in the correct direction (since if you monitor loss, it gets smaller) but they are being updated by a smaller magnitude so it is going to require a lot of steps to get it to the final place we want.

Therefore, increasing the learning rate to something like: 1e-3 worked for me.