Discriminative betas?

wgpubs · November 23, 2018, 10:40pm

Regarding 1-cycle …

Can/should we apply tiered values to Adam’s betas parameter so that the values applied to the earlier learning groups are less than those applied to the later ones, in the same way we do so with discriminative learning rates?

I don’t even know if this is possible, much less advisible, but re-watching the lesson 5 video put the thought into my head.

sgugger · November 23, 2018, 11:40pm

It’s possible and you probably can do it already in fastai by doing learn.opt.beta = np.array([...]) for the beta2 and learn.opt.mom = np.array([...]) for beta1. Or just a list will probably work too.

Note that it would have to be after the creating of the optimizer, so it should be in a callback in on_train_begin