Fit_flat_cos()'s hyperparameters, meaning and tuning

Redevil · February 21, 2021, 10:44am

Hi,
I would like to know if there is any rule of thumb in order to properly tune fit_flat_cos()'s hyperparameters, and where I could find some research paper about it.

boJa · February 21, 2021, 1:43pm

I don’t know if you saw this. The accompanying video has a good explanation.

muellerzr · February 21, 2021, 3:45pm

As @boJa mentioned, I wrote a part in my course covering what the fit_flat_cos schedule looks like and some background, but I’ll go in a bit more detail in this post for you @Redevil

Essentially similar to how we have fit_one_cycle for Adam, fit_flat_cos is for the ranger optimizer (see that link for more information on what that is).

When we first were writing that scheduler, we noticed that ranger doesn’t need a warm-up cycle like Adam does, because it’s integrated. The really only hyper-parameter you should need to tune is start_pct.

When we were doing things we found that a start_pct of 0.72 seemed to work the best, though the default in fastai is 0.7, minimal difference. Along with this, if you set the start_pct to 0 (aka apply the cosine annealing immediatly), you get Pytorch’s CosineAnnealing scheduler, which has become popular in Kaggle lately for the Adam optimizer too.

Hope this helps a little. Every other hps in it is similar to fit_one_cycle's, and you should tune them to your discretion

There is no official “paper” on Ranger itself and the fit_flat_cos method, as it was empirically found through experimentation