Teslas are needed for servers, and are always v expensive.
But what challenges my thinking with the “V” series in NVidia (Tesla V100, Titan V): what are those “640 Tensor Cores” for with regards to ML/DL ?
The only “benchmark” I found is the one posted on NVidia.com (https://www.nvidia.com/en-us/data-center/tesla-v100/) where they claim that the V100, using 640 Tensor Cores, vs the P100 achieved ResNet50 on Caffe2 in 1/3 of the time.
Which sounds like getting a Titan V “today” is a waste of money because the main libraries (PyTorch & TensorFlow) can’t exploit the Tensor Cores’ power. As in “the hardware is too advanced for the software available”.
Yes, but I tend to think frameworks will catch up fast in this case, mainly because AWS adopted and released V100 P3 instances which means lots of people and businesses have access to this architecture.
I’ve been reading a few things relating to this over the past few days too (and thanks for the Tim Dettmers link). You may have nailed it on the head with:
As in “the hardware is too advanced for the software available”.
It seems like people have put in the work on Caffe2 to support the tensor cores - and its the only one so far -> and the ResNet50 results you quote make a lot of sense (from an expected theoretical performance perspective).
The recent large increases to crypto hash rates on AMD Vega gpus (out hashing Titan V for Monero and the like) keeps making me wonder about the potential of AMD cards. Vega also seems to be in the ‘hardware too advanced for today’s software camp’ - but again there is 1 framework that has gotten HIP, MIOpen, ROCm (the AMD DL stack) working, and again its Caffe2. There is a (admittedly sketchy looking) benchmark out there showing large speedups using HIP-Caffe2 on Vega vs Caffe2 on an ‘undisclosed’ competitor GPU. While the benchmark is questionable, it also makes sense (in theory) that running in FP16 on a Vega 64 could have up to a 2x performance boost over a 1080 ti. With amd support for Tensorflow and PyTorch in dev, it will be really interesting to see where things land in the next couple quarters.
Relevant crosspost : Deep Learning Hardware Limbo
An article on the Titan V by AnandTech.
The section #2 looks at the Volta GV100 architecture and its Tensor Cores.
The section #3 looks at it Compute Performance with General Matrix Multiply (GEMM), and the challenge of testing it due to the lack of libraries supporting Volta today.
The rest is about Gaming/Graphics performance mainly.
Another Titan V review, focusing on Deep Learning performance with Caffe2 and TensorFlow.
The official pytorch imagenet example shows how to fully harness the tensor cores with fp16. I plan to implement that in fastai when I have a chance. We don’t need anything extra in pytorch.
More preliminary ML tests on Titan V vs Xp done by Puget Systems, a company building high-performance workstations.
Still no fp16 results there however @EricPB, so not really relevant AFAICT.
FWIW: I cancelled my Titan V pre-order.
No one at Nvidia (including its CEO Jensen Huang via LinkedIn/Twitter accounts) answered my basic questions: “Which library among Keras/TensorFlow or PyTorch supports Tensor Cores ? Same question for Mixed Precision ? Why upgrade a 1080 Ti to a Titan V”
$ 3,000 is a LOT of money -like a LOT LOT LOT of money for an individual (whether SoHo DS or Student).
Not getting a basic (at least respectful) answer from Nvidia nor its CEO: deal breaker for my Titan V pre-order.
Someone posted as “smerity” (probably another DL noob ) a few weeks ago, on Nvidia Devtalk forum, a very quick benchmark running his own code on his Titan V vs 1080 Ti.
So 4 1080 Ti is better than one fancy Titan V.
Back when he was testing it, it seems so.
But he was also clear in saying “Note that the codebase is running PyTorch but is not optimized for the Titan V”.
He didn’t post since re. optimizing his code for the Titan V.
May be the 2nd or 3rd generation would be much cheaper - and have better compatibility.
He was very helpful and he clearly answered my technical questions (about performance and supported frameworks). Here is a link directly from nvidia that has a great deal of information on the subject, including how to setup a FP16 training on pytorch (see Frameworks section):
I am still currently undecided to buy a Titan V. There is still a lot of software optimization needed, especially on memory bandwidth/transfer to optimally use this promising technology. It is very hard to predict if the new commercial FPGA/ASICs potentially coming in 2018 (Intel Nervana, Wave computing, Graphcore, Grok, Bitmain, and many more!) will have a great initial release software support.
Keep the recent change in NVIDIA’s EULA in mind:
For most private users this might not be an issue, however, I sure hope cuDNN will always remain available on GTXs as well.
There is also driver problems for cuda 9 for some new hardware. So, if you are planning immediate use of state of art software you have to fall back to cuda 8 compatible ones… I’m not sure though.
Yes that info has all been implemented in the official pytorch imagenet training example. I talked to the nvidia folks about it on the main pytorch forums. It shouldn’t take too much to add it to fastai, but it hasn’t moved up my priority queue just yet. However, it looks like I may be getting access to a few of these cards soon, so that might change!
send 1-4 my way, you know…for benchmarking purposes…