After much hacking I got StatefulOptimizer
and lamb_func
working with fastai v1. It’s really ugly right now but the results look promising.
I spent all day yesterday tweaking hyper-parameters on my language model and on the first run with LAMB I’m seeing a ~0.3% accuracy & validation loss improvement over my best results from yesterday (on 3 epochs, frozen).
Unfortunately it looks like I have a bug somewhere (probably somewhere in my param_groups
hacks) because once I unfroze the losses and accuracy stopped improving much whereas with fastai v1 Adam
they kept getting better.
The main things I had to do to get it working with fastai were:
- Modify
OptimWrapper
to not wrap these new-style optimizers (I did this in a hacky way by detecting the presence ofhypers
on the optimizer) - Modified the constructor of
Optimizer
to account forOptimWrapper
passing parameters in a different format to its initializer - Changed
grad_params
to not error out due to the different format of the parameters - Hacked into
Optimizer.__setattr__
to propagateOneCycle
's updates to thelr
andmom
properties through tohypers
so they get passed to thestepper
functions
The results are promising so I’m going to keep at it. I’m excited to see how things will look once I’m running bigger batch sizes on multiple GPUs since that’s where it’s really supposed to shine.
Edit: wd=0 worked even better than wd=0.01; picked up another 0.3% accuracy on the 3 frozen epochs (although with a bit higher validation loss).