What’s your go to optimizer in 2021?

That’s a great article, thanks for linking!

I myself haven’t had success with ranger+fit_flat_cos in the few times I tried. But maybe I was using it wrong. Unfortunately the experiments here only show training from scratch.

How do you do it in fine tune. Two times flat+cos for frozen and unfrozen? Or maybe only flat for the frozen portion?

How about learning rates, does it still make sense to use lr/100 for the earlier layers like fine_tune does?

Maybe @LessW2020 can offer some insights?