We’re about to look at creating a better API for the training loop, hyperparam scheduler, optimizer, etc. I’ll make this a wiki so folks can add missing pieces as they find them. Here are some things that need to be supported
- Everything from the training phase API
- For each phase, change data, batch size, optimizer
- Schedule any hyperparam (lr, momentum, wd, beta2, eps) according to any function, including handling momentum (which corresponds to beta[0] in Adam) or beta[1] (which is alpha in RMSprop)
- AdamW-style weight decay, as well as regular wd
- Discriminative (per-layer) wd and lr, including different params for weights vs bias vs batchnorm
- Call
reset
at appropriate times for RNNs - Full set of callbacks
- Try to use callbacks for as many features as possible, or find some other way to easily allow them to be customized
- All the bits necessary for half precision training
- Maintain single precision copy of weights
- batchnorm in single precision (is this automated by pytorch now)
- loss scaling
- Moving average for metrics
- Regularization added to the loss for the backprop (like seq2seq_reg in the RNNs)
Some ideas are embedded in this early project from @mcskinner. In the fastai_v1 repo there’s an “extending the training loop” section in to_refactor.ipynb
, with some working code that isn’t complete and needs refactoring.
Questions/comments/etc welcome!