Has anyone tried the new NVIDIA card?

I’m not sure what you mean. RNNs are basically entirely matrix multiplications. And with QRNN, they’re even more suitable for acceleration.

Hmm. Perhaps I misunderstand.

Last time I profiled an RNN (which admittedly was a while ago), the non-mm operations like vecadd ate up a tremendous amount of wall time. It wasn’t that those operations were all that intensive, iirc, it was that the memory loads were so slow.

Unless I’m totally misunderstanding something, I would assume that the time spent doing matrix multiplies to everything else including memory loads is lower in an RNN compared to a CNN.

Ah well that’s a good point. I suspect with modern approaches (multi layer, bidir, attention) and the HBM memory that may not be so true any more. Especially with QRNN. I’m not sure though - I’d love to see benchmarks.

According to this benchmark, your expectations are correct @aloisius https://www.xcelerit.com/computing-benchmarks/insights/benchmarks-deep-learning-nvidia-p100-vs-v100-gpu/ . I’m not sure if they used the cudnn RNN op for this however.

1 Like

Here’s another benchmark, from Dell HPC, on the V100 vs P100, including PCIe versions, running ResNet50 on NV-Caffe, MXNet and TensorFlow, done in October 2017.

http://en.community.dell.com/techcenter/high-performance-computing/b/general_hpc/archive/2017/09/27/deep-learning-on-v100

And also another blog post here.


The keyword seems to be Mixed Precision Training here and the post includes two links to research articles from Nvidia and Baidu.

(Links courtesy of Anokas twitter account https://twitter.com/mikb0b/status/940271947316912128)

Someone opened a ticket on PyTorch Discuss forum, re. poor performance of the Titan V vs 1080 Ti.
Probably the first ML comparison (there are Gaming benchmarks already, not so positive for the Titan V, in performance per dollar, but is that its main task ?).

It was answered by a PyTorch Dev with some code fixes, though I’m not sure if this is representative/final for the Titan V and its unique combo of Tensor Cores + Mixed Precision Training, vs the 1080 Ti or Titan Xp.

1 Like

I don’t think this is using the tensor cores at all - there’s quite a bit more to do to get that working.

A post from Tim Dettmers (" Which GPU for Deep Learning") on the Titan V and incoming competition from Intel & AMD.

The more I read about the “abysmal/poor value” of the Titan V for $3000, the more I wonder how come the original Tesla V100 at $10K could find a market in the first place.
Am I missing something ?

1 Like

Teslas are needed for servers, and are always v expensive.

1 Like

But what challenges my thinking with the “V” series in NVidia (Tesla V100, Titan V): what are those “640 Tensor Cores” for with regards to ML/DL ?

The only “benchmark” I found is the one posted on NVidia.com (https://www.nvidia.com/en-us/data-center/tesla-v100/) where they claim that the V100, using 640 Tensor Cores, vs the P100 achieved ResNet50 on Caffe2 in 1/3 of the time.

Which sounds like getting a Titan V “today” is a waste of money because the main libraries (PyTorch & TensorFlow) can’t exploit the Tensor Cores’ power. As in “the hardware is too advanced for the software available”.

E.

Yes, but I tend to think frameworks will catch up fast in this case, mainly because AWS adopted and released V100 P3 instances which means lots of people and businesses have access to this architecture.

2 Likes

I’ve been reading a few things relating to this over the past few days too (and thanks for the Tim Dettmers link). You may have nailed it on the head with:

As in “the hardware is too advanced for the software available”.

It seems like people have put in the work on Caffe2 to support the tensor cores - and its the only one so far -> and the ResNet50 results you quote make a lot of sense (from an expected theoretical performance perspective).

The recent large increases to crypto hash rates on AMD Vega gpus (out hashing Titan V for Monero and the like) keeps making me wonder about the potential of AMD cards. Vega also seems to be in the ‘hardware too advanced for today’s software camp’ - but again there is 1 framework that has gotten HIP, MIOpen, ROCm (the AMD DL stack) working, and again its Caffe2. There is a (admittedly sketchy looking) benchmark out there showing large speedups using HIP-Caffe2 on Vega vs Caffe2 on an ‘undisclosed’ competitor GPU. While the benchmark is questionable, it also makes sense (in theory) that running in FP16 on a Vega 64 could have up to a 2x performance boost over a 1080 ti. With amd support for Tensorflow and PyTorch in dev, it will be really interesting to see where things land in the next couple quarters.

2 Likes

Relevant crosspost : Deep Learning Hardware Limbo

An article on the Titan V by AnandTech.

The section #2 looks at the Volta GV100 architecture and its Tensor Cores.

The section #3 looks at it Compute Performance with General Matrix Multiply (GEMM), and the challenge of testing it due to the lack of libraries supporting Volta today.

The rest is about Gaming/Graphics performance mainly.

https://www.anandtech.com/show/12170/nvidia-titan-v-preview-titanomachy

Another Titan V review, focusing on Deep Learning performance with Caffe2 and TensorFlow.

https://www.pcper.com/reviews/Graphics-Cards/NVIDIA-TITAN-V-Review-Part-3-Deep-Learning-Performance

The official pytorch imagenet example shows how to fully harness the tensor cores with fp16. I plan to implement that in fastai when I have a chance. We don’t need anything extra in pytorch.

3 Likes

More preliminary ML tests on Titan V vs Xp done by Puget Systems, a company building high-performance workstations.

Still no fp16 results there however @EricPB, so not really relevant AFAICT.

FWIW: I cancelled my Titan V pre-order.

No one at Nvidia (including its CEO Jensen Huang via LinkedIn/Twitter accounts) answered my basic questions: “Which library among Keras/TensorFlow or PyTorch supports Tensor Cores ? Same question for Mixed Precision ? Why upgrade a 1080 Ti to a Titan V”

$ 3,000 is a LOT of money -like a LOT LOT LOT of money for an individual (whether SoHo DS or Student).

Not getting a basic (at least respectful) answer from Nvidia nor its CEO: deal breaker for my Titan V pre-order.

5 Likes

Someone posted as “smerity” (probably another DL noob :sunglasses:) a few weeks ago, on Nvidia Devtalk forum, a very quick benchmark running his own code on his Titan V vs 1080 Ti.

https://devtalk.nvidia.com/default/topic/1027645/nvidia-smi-not-recognizing-titan-v-/#5228400