That’s a great article, thanks for linking!
I myself haven’t had success with ranger+fit_flat_cos in the few times I tried. But maybe I was using it wrong. Unfortunately the experiments here only show training from scratch.
How do you do it in fine tune. Two times flat+cos for frozen and unfrozen? Or maybe only flat for the frozen portion?
How about learning rates, does it still make sense to use lr/100 for the earlier layers like fine_tune
does?
Maybe @LessW2020 can offer some insights?