Project: cyclical momentum

sgugger · April 6, 2018, 1:22pm

To let us experiment as we want without breaking the existing use_clr, I’ve made a pull request with another argument use_clr_beta. It’s a tuple with either two elements (no cyclical momentum) or four, which are:
(div,pct,max_mom,min_mom)
use_clr_beta = (10,10,0.95,0.85) with a learning rate of 0.01 will give a schedule like this:

div is the amount the LR is divided to give us the minimum LR, pct the percentage of our number of iterations devoted to the last part where we anneal the LR (I deliberately left it like this and not 0.1 to break less things in the existing use of use_clr where this value is often around 10).

Now I’ll work on a LR_finder_beta that gives us the nice plots Leslie uses to tune his other hyperparameters.