8-bit Training

balnazzar · December 18, 2019, 8:06pm

As many of us have become aware, the new frontiers of NLP (that is, the big transformers) make our life harder as it comes to training state-of-the-art models with personal hardware. Even those amongst us lucky enough to afford a high-end personal workstation (typically 2x2080ti) may have some difficulties. The problem, of course, would be memory size rather than training speed.

Tim Dettmers (best known for his useful in-depth comparison about consumer-grade gpus) seems to have found that training in 8-bit (with fp32 accumulation, which Turing possesses) is indeed possible, with mimimal impacts upon accuracies:

In that paper, Dettmers is after speedups for large gpu clusters as he investigates 8-bit training, while I (us?) would be much more interested in saving memory w.r.t. 16-bit training on Turing/Volta.

I’d really like to know if you or the fastai team, always pushing the frontier about new research developments, have ever tested 8-bit training.