Interesting result so far!
Before concluding that Novograd is better than Adam in this case, you might want to rerun the baseline with Jeremy’s intended effective learning rate. (as I described elsewhere[1], the baseline was run at lr = 0.75e-3 and not the intended lr = 3e-3 because of an oversight in the code).
I don’t have data for 128 px, but on imagewoof 256px, I reworked the baseline at 83.9% rather than 81.8%
(the 85.7% being my entry to the leaderboard, but I don’t like it too much because it runs slower).
Also, I suggest running on vast.ai when salamander is not happening. Cheaper, faster, no pre-emption; but you usually “lose” your machine and files if you stop, and machines are not always available.
[1] ImageNette/Woof Leaderboards - guidelines for proving new high scores?