Meet RAdam - imo the new state of the art AI optimizer

Interesting result so far!

Before concluding that Novograd is better than Adam in this case, you might want to rerun the baseline with Jeremy’s intended effective learning rate. (as I described elsewhere[1], the baseline was run at lr = 0.75e-3 and not the intended lr = 3e-3 because of an oversight in the code).

I don’t have data for 128 px, but on imagewoof 256px, I reworked the baseline at 83.9% rather than 81.8%
(the 85.7% being my entry to the leaderboard, but I don’t like it too much because it runs slower).

Also, I suggest running on vast.ai when salamander is not happening. Cheaper, faster, no pre-emption; but you usually “lose” your machine and files if you stop, and machines are not always available.

[1] ImageNette/Woof Leaderboards - guidelines for proving new high scores?