Anyone got experience with the new Nvidia Titan V?

Nvidia advertised during last NIPS conference this new GPU : https://www.nvidia.com/en-us/titan/titan-v/

This is definitely an expansive card (2999 USD$) but it has a spec of 110 tflops at mixed FP16/FP32 precision with the new tensor cores which is the fastest spec for a PCI card. This is technically a cheaper price per tflop compared to 1080ti or 1070 that are still at a sweet spot for price/performance ratio. But it is basically an apple (FP32) to orange (mixed FP16/FP32) tflops comparison.

Popular APIs (pytorch, tensorflow) doesn’t look like they support very well this new mixed FP16/FP32 paradigm : https://devblogs.nvidia.com/parallelforall/programming-tensor-cores-cuda-9/

But the results advertised by nvidia looks promising compared to FP32 : https://devblogs.nvidia.com/parallelforall/mixed-precision-training-deep-neural-networks/

I just wanted to know if anyone tried this card in mixed fp16/fp32 setting with any deep learning API ? Any preliminary result with computer vision problems ?

Because, if this mixed precision training really works in, real, non marketing life, 4 x this card on a single motherboard = 440 tflops which is almost half the computing power of the fastest computer on earth in 2008 for 12000$. But if it doesn’t work, that is definitely too expansive for 5120 standard cuda cores at FP32.

You’ll quite fairly similar performance with this as with a V100 in an AWS P3. So try that first and see what you think.

To leverage the tensor cores you’ll need to do some fiddling - see our dawnbench code for some help there. In practice the 110 tflops is only for small matrix multiplication at half precision, which only gives about 2x performance improvement.