Been experimenting with fp16 and multi-GPU training, and I’ve observed a number of known issues. I’m using the latest commit from the repo directly, which should be more patched than “pip install fastai==2.0.0”
First off, my understanding is that fp16 is supposed to approx half the memory requirements of a model, and if you’re running on newer GPUs with tensor cores, you should get a 2-3X speedup.
EDIT: It’s more complicated, see my follow up post.
Thus far on single-GPU jobs:
-
learn.to_fp16() does halve the memory requirements (I can use approx 2X batch size), but there is no speedup at all.
-
learn.to_native_fp16() does not half the memory requirement and there is no speedup. In practice, it seems to do nothing compared to normal fp32 training
For multi-GPU jobs using Parallel (not Distributed Parallel):
-
learn.to_fp16() does halve the memory requirements somewhat. On twin GPUs setup, I see that GPU #1 uses more memory than GPU #2, so it’s more like 1.5-2.0X increase in batch size. As before, there is no speedup at all.
-
learn.to_fp16() is very YMMV. I had the same code, same container, same drivers crash on a different machine… and I do not know why. Error message indicates data may not be sent to the right device?
RuntimeError: Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)
-
learn.to_native_fp16() does not half the memory requirement and there is no speedup. It does not cause crash in parallel mode, but again, I’m not sure if it’s working properly.
In short, learn.to_fp16() seems to be the best option, but there is still no speedup from tensor cores (why?), and it causes issue with multi-GPU training (sometimes).
I’ve seen some folks here mention about APEX, and being supported by NVIDIA it looks like a better option, but it looks like there is no APEX fp16 compatibility with fastai v2?
Is your experience consistent with mine? Or have you found a better way to do multi-GPU fp16 training?