Meet RAdam - imo the new state of the art AI optimizer

I’ve run some initial tests with Radam vs Adam on Imagenette.

  1. I first reran the Adam baseline because of the glitch with learning rate that I detailed elsewhere [1]

%run train_imagenette.py --epochs 5 --bs 64 --lr 12e-3 --mixup 0

(note that the effective learning rate is 3e-3)

Results (10 runs): [86,85.4,84.4,85.2,84.8,85,85.6,85.4,85.4,86.4]
Mean: 85.3%

  1. Radam, learning rate 3e-2 (as in the OP’s article)

%run train_imagenette.py --epochs 5 --bs 64 --lr 12e-2 --mixup 0 --opt ‘radam’
(effective lr = 3e-2)

Results: [85.2,85.2,83.6,85.4,85.6, 86, 84.6, 84.6,82.8,85.4]
Mean: 84.8%

  1. I tried Radam with wd = 1e-3 instead of 1e-2, cout=1000 to be close to OP’s parameters (based on the Medium article).
    Note that normally I specify cout=10, but OP didn’t do so. I don’t think it makes a difference (though losses are higher), but you never know.

Results: [84.2, 86.2, 83.2, 83.8,85.2, 85.2, 84.4,82.6,84.8,84.4]
Mean: 84.4%

Conclusion: I haven’t been able to get better results with RAdam than with Adam on Imagenette (128px) so far. I am able to get 86% on some of my best runs, but on average Radam is doing worse that Adam when running for 5 epochs.

[1] ImageNette/Woof Leaderboards - guidelines for proving new high scores?