Well surprisingly right after my article on RAdam, I found a new paper on LookAhead optimizer in part by Geoffrey Hinton.
RAdam stabilizes training at the start, LookAhead stabilizes training and convergence during the rest of training…so it was immediately clear that putting the two together might build a dream team optimizer.
I was not disappointed as the first run with Ranger (integration of both) jumped to 93% on the 20 epoch ImageNette test.
I’ve written yet another optimizer article to get into more details:
And put the Ranger source out for anyone to quickly test. I merged these into one codebase to make it easier to integrate into FastAI and general use, but you can also plug in AdamW, SGD into LookAhead directly.
Ranger code is here:
I’m readying a notebook as well based on previous feedback but I kept getting kicked off over and over today at Salamander…so was not able to finish my testing and notebook.
But here’s the basic process:
see new high % start appearing: