Teslas are needed for servers, and are always v expensive.
But what challenges my thinking with the âVâ series in NVidia (Tesla V100, Titan V): what are those â640 Tensor Coresâ for with regards to ML/DL ?
The only âbenchmarkâ I found is the one posted on NVidia.com (https://www.nvidia.com/en-us/data-center/tesla-v100/) where they claim that the V100, using 640 Tensor Cores, vs the P100 achieved ResNet50 on Caffe2 in 1/3 of the time.
Which sounds like getting a Titan V âtodayâ is a waste of money because the main libraries (PyTorch & TensorFlow) canât exploit the Tensor Coresâ power. As in âthe hardware is too advanced for the software availableâ.
E.
Yes, but I tend to think frameworks will catch up fast in this case, mainly because AWS adopted and released V100 P3 instances which means lots of people and businesses have access to this architecture.
Iâve been reading a few things relating to this over the past few days too (and thanks for the Tim Dettmers link). You may have nailed it on the head with:
As in âthe hardware is too advanced for the software availableâ.
It seems like people have put in the work on Caffe2 to support the tensor cores - and its the only one so far â and the ResNet50 results you quote make a lot of sense (from an expected theoretical performance perspective).
The recent large increases to crypto hash rates on AMD Vega gpus (out hashing Titan V for Monero and the like) keeps making me wonder about the potential of AMD cards. Vega also seems to be in the âhardware too advanced for todayâs software campâ - but again there is 1 framework that has gotten HIP, MIOpen, ROCm (the AMD DL stack) working, and again its Caffe2. There is a (admittedly sketchy looking) benchmark out there showing large speedups using HIP-Caffe2 on Vega vs Caffe2 on an âundisclosedâ competitor GPU. While the benchmark is questionable, it also makes sense (in theory) that running in FP16 on a Vega 64 could have up to a 2x performance boost over a 1080 ti. With amd support for Tensorflow and PyTorch in dev, it will be really interesting to see where things land in the next couple quarters.
Relevant crosspost : Deep Learning Hardware Limbo
An article on the Titan V by AnandTech.
The section #2 looks at the Volta GV100 architecture and its Tensor Cores.
The section #3 looks at it Compute Performance with General Matrix Multiply (GEMM), and the challenge of testing it due to the lack of libraries supporting Volta today.
The rest is about Gaming/Graphics performance mainly.
https://www.anandtech.com/show/12170/nvidia-titan-v-preview-titanomachy
Another Titan V review, focusing on Deep Learning performance with Caffe2 and TensorFlow.
https://www.pcper.com/reviews/Graphics-Cards/NVIDIA-TITAN-V-Review-Part-3-Deep-Learning-Performance
The official pytorch imagenet example shows how to fully harness the tensor cores with fp16. I plan to implement that in fastai when I have a chance. We donât need anything extra in pytorch.
More preliminary ML tests on Titan V vs Xp done by Puget Systems, a company building high-performance workstations.
Still no fp16 results there however @EricPB, so not really relevant AFAICT.
FWIW: I cancelled my Titan V pre-order.
No one at Nvidia (including its CEO Jensen Huang via LinkedIn/Twitter accounts) answered my basic questions: âWhich library among Keras/TensorFlow or PyTorch supports Tensor Cores ? Same question for Mixed Precision ? Why upgrade a 1080 Ti to a Titan Vâ
$ 3,000 is a LOT of money -like a LOT LOT LOT of money for an individual (whether SoHo DS or Student).
Not getting a basic (at least respectful) answer from Nvidia nor its CEO: deal breaker for my Titan V pre-order.
Someone posted as âsmerityâ (probably another DL noob ) a few weeks ago, on Nvidia Devtalk forum, a very quick benchmark running his own code on his Titan V vs 1080 Ti.
https://devtalk.nvidia.com/default/topic/1027645/nvidia-smi-not-recognizing-titan-v-/#5228400
So 4 1080 Ti is better than one fancy Titan V.
Back when he was testing it, it seems so.
But he was also clear in saying âNote that the codebase is running PyTorch but is not optimized for the Titan Vâ.
He didnât post since re. optimizing his code for the Titan V.
May be the 2nd or 3rd generation would be much cheaper - and have better compatibility.
@EricPB, I contacted directly Paulius one of the first coauthors of the original mixed precision training paper : https://arxiv.org/abs/1710.03740
He was very helpful and he clearly answered my technical questions (about performance and supported frameworks). Here is a link directly from nvidia that has a great deal of information on the subject, including how to setup a FP16 training on pytorch (see Frameworks section):
http://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html
I am still currently undecided to buy a Titan V. There is still a lot of software optimization needed, especially on memory bandwidth/transfer to optimally use this promising technology. It is very hard to predict if the new commercial FPGA/ASICs potentially coming in 2018 (Intel Nervana, Wave computing, Graphcore, Grok, Bitmain, and many more!) will have a great initial release software support.
Keep the recent change in NVIDIAâs EULA in mind:
For most private users this might not be an issue, however, I sure hope cuDNN will always remain available on GTXs as well.
There is also driver problems for cuda 9 for some new hardware. So, if you are planning immediate use of state of art software you have to fall back to cuda 8 compatible ones⊠Iâm not sure though.
Yes that info has all been implemented in the official pytorch imagenet training example. I talked to the nvidia folks about it on the main pytorch forums. It shouldnât take too much to add it to fastai, but it hasnât moved up my priority queue just yet. However, it looks like I may be getting access to a few of these cards soon, so that might change!
send 1-4 my way, you knowâŠfor benchmarking purposesâŠ