Nvidia advertised during last NIPS conference this new GPU : https://www.nvidia.com/en-us/titan/titan-v/
This is definitely an expansive card (2999 USD$) but it has a spec of 110 tflops at mixed FP16/FP32 precision with the new tensor cores which is the fastest spec for a PCI card. This is technically a cheaper price per tflop compared to 1080ti or 1070 that are still at a sweet spot for price/performance ratio. But it is basically an apple (FP32) to orange (mixed FP16/FP32) tflops comparison.
Popular APIs (pytorch, tensorflow) doesn’t look like they support very well this new mixed FP16/FP32 paradigm : https://devblogs.nvidia.com/parallelforall/programming-tensor-cores-cuda-9/
But the results advertised by nvidia looks promising compared to FP32 : https://devblogs.nvidia.com/parallelforall/mixed-precision-training-deep-neural-networks/
I just wanted to know if anyone tried this card in mixed fp16/fp32 setting with any deep learning API ? Any preliminary result with computer vision problems ?
Because, if this mixed precision training really works in, real, non marketing life, 4 x this card on a single motherboard = 440 tflops which is almost half the computing power of the fastest computer on earth in 2008 for 12000$. But if it doesn’t work, that is definitely too expansive for 5120 standard cuda cores at FP32.