I have been asked this question lately by my friends from the Hardware industry.
FP16 or FP32? Bonus Q: is there an article which helps me understand the relationship between floating point precision and model performance.
I pose this (possibly googlable) question because I think this should be the beginning of discussion on how we can understand DL hardware better. Ofcourse, I did observe a substantial bump in model training time when I moved from K80 to V100. But, how should I assimilate this stat better than just saying that the V100 is the shiniest toy in the rack.
You can read this recent paper about this subject : https://arxiv.org/abs/1710.03740
Here is another interesting nvidia source of information from nvidia related to tensor cores and mixed precision training: http://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html
And this forum thread:Has anyone tried the new NVIDIA card?
I hope it helps.
I do a lot of deep learning work on iOS and that is all 16-bit floating point. (This is inference only, no training at the moment.)
Usually models are trained with 32-bit float and when I convert them to run on the iPhone GPU, I convert the weights to 16-bit floats and make all operations 16-bit too. Then I compare the output of the original model to the iOS model, using an identical test image (or whatever kind of data).
In pretty much every layer there are errors between the original 32-bit model and the 16-bit model. Usually they are around 1e-3 (the precision limit of 16-bit float) but it’s not uncommon to see errors up to ±0.5. (Anything higher than that is typically a real error in the conversion of the model.)
But in the end, it does not matter. The models still work quite robustly and give the same predictions as the original. So at least for inference, switching to 16-bit floats appears to work just fine.