Mixed precision training

(Charl P. Botha) #21

I’m doing mixed precision training daily on my RTX 2070.

I just ran into some overflow issues using fastai’s fp16 support, but I’ve hacked the code a bit to use NVIDIA’s Apex amp for its dynamic loss scaling, and now this se-resnext50 is training without issues (so far) all in the 8GB of VRAM on the 2070.

fp16 / TensorCore support in CUDA10 and PyTorch means I can train much larger networks on this card than would be otherwise possible. In my more informal testing, there’s also a speed boost going from fp32 to fp16.

(Thomas) #22

Are this new RTX fast?

(Shiv Gowda) #23

For one of my workloads RTX 2070 is around 5-10% slower than 1080 Ti and around 50% slower than V100. In terms of price per value, I think it is the best Deep Learning GPU right now in the market.

Thanks @sgugger for the fp16 enhancement, I am able to train much larger models now(previously I could not train on 512x512 image because I could fit in only a batch size of 8, now I can train that dataset with a batch size of 56). There was a batch norm bug(most likely in pytorch) which prevented me from using it last week, but in the new PyTorch 1.0 release the bug seems to have been cleared and I can clearly see the value of this feature.

(Mohamed Hassan Elabasiri) #24

I am using fast.ai and Pytorch 1.0 and its working fine on my GTX 980ti.