SGD optimizations

sravya8 · February 13, 2017, 11:50pm

Here is my mindmap of SGD optimizations.

It seems like Nadam is the best. Although, we use Adam in most examples. Curious what you guys think about Nadam?

Also, I am a bit confused why is learning rate annealing helping with adaptive learning rate optimizations? Isn’t the algorithm supposed to handle that if it is adaptive?

Thanks,

skottapa · February 15, 2017, 3:55am

Hi Sravya,

Great visualization.

Regarding your question on LR, I had the same thing in mind as to the necessity of annealing with adaptive learning rate. But I do know from personal experience that even with Nadam, using the ReduceLROnPlateau callback significantly reduced the error rate.

Haven’t found any good explanation for this.

regards
Satish