Hi,
I would like to know if there is any rule of thumb in order to properly tune fit_flat_cos()'s hyperparameters, and where I could find some research paper about it.
As @boJa mentioned, I wrote a part in my course covering what the fit_flat_cos
schedule looks like and some background, but I’ll go in a bit more detail in this post for you @Redevil
Essentially similar to how we have fit_one_cycle
for Adam
, fit_flat_cos
is for the ranger
optimizer (see that link for more information on what that is).
When we first were writing that scheduler, we noticed that ranger
doesn’t need a warm-up cycle like Adam
does, because it’s integrated. The really only hyper-parameter you should need to tune is start_pct
.
When we were doing things we found that a start_pct of 0.72 seemed to work the best, though the default in fastai is 0.7, minimal difference. Along with this, if you set the start_pct
to 0 (aka apply the cosine annealing immediatly), you get Pytorch’s CosineAnnealing
scheduler, which has become popular in Kaggle lately for the Adam optimizer too.
Hope this helps a little. Every other hps in it is similar to fit_one_cycle
's, and you should tune them to your discretion
There is no official “paper” on Ranger
itself and the fit_flat_cos
method, as it was empirically found through experimentation