A post from Tim Dettmers (" Which GPU for Deep Learning") on the Titan V and incoming competition from Intel & AMD.
The more I read about the “abysmal/poor value” of the Titan V for $3000, the more I wonder how come the original Tesla V100 at $10K could find a market in the first place.
Am I missing something ?
Which sounds like getting a Titan V “today” is a waste of money because the main libraries (PyTorch & TensorFlow) can’t exploit the Tensor Cores’ power. As in “the hardware is too advanced for the software available”.
Yes, but I tend to think frameworks will catch up fast in this case, mainly because AWS adopted and released V100 P3 instances which means lots of people and businesses have access to this architecture.
I’ve been reading a few things relating to this over the past few days too (and thanks for the Tim Dettmers link). You may have nailed it on the head with:
As in “the hardware is too advanced for the software available”.
It seems like people have put in the work on Caffe2 to support the tensor cores - and its the only one so far -> and the ResNet50 results you quote make a lot of sense (from an expected theoretical performance perspective).
The recent large increases to crypto hash rates on AMD Vega gpus (out hashing Titan V for Monero and the like) keeps making me wonder about the potential of AMD cards. Vega also seems to be in the ‘hardware too advanced for today’s software camp’ - but again there is 1 framework that has gotten HIP, MIOpen, ROCm (the AMD DL stack) working, and again its Caffe2. There is a (admittedly sketchy looking) benchmark out there showing large speedups using HIP-Caffe2 on Vega vs Caffe2 on an ‘undisclosed’ competitor GPU. While the benchmark is questionable, it also makes sense (in theory) that running in FP16 on a Vega 64 could have up to a 2x performance boost over a 1080 ti. With amd support for Tensorflow and PyTorch in dev, it will be really interesting to see where things land in the next couple quarters.
The section #2 looks at the Volta GV100 architecture and its Tensor Cores.
The section #3 looks at it Compute Performance with General Matrix Multiply (GEMM), and the challenge of testing it due to the lack of libraries supporting Volta today.
The rest is about Gaming/Graphics performance mainly.
The official pytorch imagenet example shows how to fully harness the tensor cores with fp16. I plan to implement that in fastai when I have a chance. We don’t need anything extra in pytorch.
No one at Nvidia (including its CEO Jensen Huang via LinkedIn/Twitter accounts) answered my basic questions: “Which library among Keras/TensorFlow or PyTorch supports Tensor Cores ? Same question for Mixed Precision ? Why upgrade a 1080 Ti to a Titan V”
$ 3,000 is a LOT of money -like a LOT LOT LOT of money for an individual (whether SoHo DS or Student).
Not getting a basic (at least respectful) answer from Nvidia nor its CEO: deal breaker for my Titan V pre-order.
Someone posted as “smerity” (probably another DL noob ) a few weeks ago, on Nvidia Devtalk forum, a very quick benchmark running his own code on his Titan V vs 1080 Ti.
Back when he was testing it, it seems so.
But he was also clear in saying “Note that the codebase is running PyTorch but is not optimized for the Titan V”.
He didn’t post since re. optimizing his code for the Titan V.
He was very helpful and he clearly answered my technical questions (about performance and supported frameworks). Here is a link directly from nvidia that has a great deal of information on the subject, including how to setup a FP16 training on pytorch (see Frameworks section): http://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html
I am still currently undecided to buy a Titan V. There is still a lot of software optimization needed, especially on memory bandwidth/transfer to optimally use this promising technology. It is very hard to predict if the new commercial FPGA/ASICs potentially coming in 2018 (Intel Nervana, Wave computing, Graphcore, Grok, Bitmain, and many more!) will have a great initial release software support.
There is also driver problems for cuda 9 for some new hardware. So, if you are planning immediate use of state of art software you have to fall back to cuda 8 compatible ones… I’m not sure though.
Yes that info has all been implemented in the official pytorch imagenet training example. I talked to the nvidia folks about it on the main pytorch forums. It shouldn’t take too much to add it to fastai, but it hasn’t moved up my priority queue just yet. However, it looks like I may be getting access to a few of these cards soon, so that might change!