LAMB Optimizer

After much hacking I got StatefulOptimizer and lamb_func working with fastai v1. It’s really ugly right now but the results look promising.

I spent all day yesterday tweaking hyper-parameters on my language model and on the first run with LAMB I’m seeing a ~0.3% accuracy & validation loss improvement over my best results from yesterday (on 3 epochs, frozen).

Unfortunately it looks like I have a bug somewhere (probably somewhere in my param_groups hacks) because once I unfroze the losses and accuracy stopped improving much whereas with fastai v1 Adam they kept getting better.

The main things I had to do to get it working with fastai were:

  • Modify OptimWrapper to not wrap these new-style optimizers (I did this in a hacky way by detecting the presence of hypers on the optimizer)
  • Modified the constructor of Optimizer to account for OptimWrapper passing parameters in a different format to its initializer
  • Changed grad_params to not error out due to the different format of the parameters
  • Hacked into Optimizer.__setattr__ to propagate OneCycle's updates to the lr and mom properties through to hypers so they get passed to the stepper functions

The results are promising so I’m going to keep at it. I’m excited to see how things will look once I’m running bigger batch sizes on multiple GPUs since that’s where it’s really supposed to shine.

Edit: wd=0 worked even better than wd=0.01; picked up another 0.3% accuracy on the 3 frozen epochs (although with a bit higher validation loss).

3 Likes