I would like to know if there is any rule of thumb in order to properly tune fit_flat_cos()'s hyperparameters, and where I could find some research paper about it.
I don’t know if you saw this. The accompanying video has a good explanation.
Essentially similar to how we have
fit_flat_cos is for the
ranger optimizer (see that link for more information on what that is).
When we first were writing that scheduler, we noticed that
ranger doesn’t need a warm-up cycle like
Adam does, because it’s integrated. The really only hyper-parameter you should need to tune is
When we were doing things we found that a start_pct of 0.72 seemed to work the best, though the default in fastai is 0.7, minimal difference. Along with this, if you set the
start_pct to 0 (aka apply the cosine annealing immediatly), you get Pytorch’s
CosineAnnealing scheduler, which has become popular in Kaggle lately for the Adam optimizer too.
Hope this helps a little. Every other hps in it is similar to
fit_one_cycle's, and you should tune them to your discretion
There is no official “paper” on
Ranger itself and the
fit_flat_cos method, as it was empirically found through experimentation