I recently saw this article on NVIDIA automatic mixed-precision in PyTorch and as a novice in these lower-level hardware details I am curious about a couple of things.
- How similar is Amp vs. fastai’s methodology for enabling mixed-precision training? For example, do they differ in designation of which pieces are/aren’t executed in FP16?
- Should Amp provide additional speed/accuracy improvements in either all or specific scenarios, or is it more focused on ease of use?
Very interested in hearing any other points of comparison or opinions on this topic!