Ranger/lamb optimizer for NLP tasks

cdparks · February 25, 2020, 4:49am

Hi everyone,

I have noticed a bit of discussion regarding new optimizers such as ranger and lamb. I have seen people experiment with these on image recognition challenges, but I haven’t seen anyone post results for wiki text 103 or related NLP tasks. Has anyone trained using these new optimizers on NLP data sets? If so, could you share the optimizer type, learning rate scheduler (fit one cycle, fit, etc) and optimizer hyper-parameters employed? The only examples I can find of fastai v2 with ulmfit are still employing adam (perhaps because it still work best!)

muellerzr · February 25, 2020, 4:57am

A general rule of thumb for Ranger especially is use the fit_flat_cos scheduling. In terms of LR, I haven’t played with that on text. Perhaps try on the IMDB sample dataset as a small idea? I always chose ~4e-3 for vision, and 1e-2(ish) for tabular problems if that helps