Mixed precision training

sgugger · April 18, 2019, 12:54pm

The benefit of FP16 is only visible when using modern GPUs like V100s. Also, you have to make sure all your tensors dimensions are multiple of 8s.