I’m very excited to introduce you to a paper by Liu, Jiang, He et al that I think will instantly improve your AI results (seriously) vs Adam.
I tested it on ImageNette and quickly got new high accuracy scores for the 5 and 20 epoch 128px leaderboard scores, so I know it works.
The simple summary - the authors investigated why all adaptive optimizers require a warmup and found it was a result of excessive variance of the adaptive momentum in the early stages. They then developed a dynamic algorithm to adjust the adaptive momentum based on the variance and show that with this, RAdam readily outperforms vanilla Adam with or without a warmup while providing much greater robustness to varying learning rates.
This quick image from their paper should get your interest:
I’ve written a complete summary and overview along with example of using in FastAI here:
Here’s the link the paper directly and more importantly to their github with the unassuming one word readme (“RAdam”). I would recommend jumping directly to their github and trying it out as it really appears to be new state of the art for optimizers now;
I have to say after having tested a lot of papers this year, it seems that most over-promise and under-deliver on unseen datasets.
However, it appears RAdam delivers since I jumped to new high accuracy relative to the ImageNette leaderboard with it with minimal work other than plugging it in, and I’ve run a lot of tests trying to beat it with various papers before
Enjoy and please post any results if you test it out!