I spent this morning merging Lookahead + RAdam into a single optimizer (Ranger). The merge may not have been needed as you can link both together by passing one into the other, but I felt it would make it easier to have one single optimizer to integrate with FastAI.
Lookahead is Hinton’s paper from last month that they showed outperformed SGD - it basically maintains a slow average that periodically merges in with the regular optimizer weights (Adam was what they used but I’m using RAdam)…analogy is a buddy system where one explores while the other has a rope to pull them back if it turns out to be a bad path.
Anyway, first results are impressive - got 93% for 20 epochs vs 92% current leaderboard and I had 92.4-92.5% with RAdam alone. That’s a first run only with a guesstimate LR, but makes me feel confident I’m on the right track.
Training looks even more stable than with RAdam alone.
Was doing the next run and got kicked off via pre-emption yet again on Salamander, so I’ll continue with more runs later as I have to get other real life things done…but wanted to post that for anyone that didn’t have improvements with RAdam, hold out for Ranger here as it brings another arrow to the game.