Has anyone tried the new NVIDIA card?

Hi Alexandre,

I had a mail exchange with Guillaume Barat, part of NVIDIA Business Development EMEA for Deep Learning Education and Research, after an introduction by a member of KaggleNoobs early Jan-2018.

Here’s a paste of his answer:


(…) I will suggest you to use our developer forum where you will have real engineers answering your questions :wink:

https://developer.nvidia.com/

Today :

  • Both pyTorch and Tensorflow within NGC leverage FP16 (and therefore the TensorCore).
    NGC is a beautiful docker environment close to the one you have on our DGX-1 solution that will provide you with updated and GPU tuned frameworks on a regular basis. It is free and you can use it on your station, on the cloud and on a DGX.

  • Public PyTorch is now able to leverage FP16 and TensorCore.

  • Public TensorFlow is still working on it

NGC can really be a productive environment that will accelerate your installation and guaranty you the best performance on a NVIDIA GPU.


I’m not familiar with Docker environment, my DL setup is using Ubuntu 16.04 with two separate environments (Fastai & Keras/TensorFlow) which work fine for Kaggle.
Plus I’m not too excited about experimenting new NVIDIA setups: I got burned several times in the past when installing new GPU drivers (and incompatible versions of CUDA/CUdnn/TensorFlow), resulting in a corrupted setup -> complete Ubuntu reinstall grrr…:rage: :upside_down_face:

@EricPB that reply is largely unrelated to actually using the TensorCores. To use them, you need to leverage the code that’s in the Pytorch official imagenet training example. You can re-use that code to train other models too, as long as the data is in the same format.

It was the reply to my final point, where I stated my cancelling the Titan V pre-order:

“To conclude (…), I’d love to invest some student money into a Titan V, IF I was convinced that its unique capabilities (Tensor Cores + Mixed Precision) were effectively supported/used by TensorFlow and PyTorch for my Kaggle assignments/competitions.”

At this point, I believe 100% I made the right decision of cancelling because
(1) it’s a really expensive GPU card,
(2) I don’t master the technicalities behind it, as you point out and
(3) my 1080Ti + Ryzen 1700X combo does a great job already and is not the true culprit why I don’t get a Gold Medal on Kaggle every time :innocent:

Yes I agree with your decision.

FWIW: I cancelled my Titan V pre-order.

I was considering buying one myself but I never saw a test that went much further than 30% more than a 1080Ti.

A year later and the 2080Ti is as fast as the Titan V for $1200, which is way overpriced and probably is only to sell off 10 series inventory.
From what I gather 7nm GPUs by Nvidia are on the horizon for 2019. (The 10 series is 16nm and the 20 series is at 12nm.)
Maybe Nvidia will release consumer cards with NVLink (fully combine 2 GPUs into one).
SSDs and DRAM are supposed to get cheaper too in 2019. Maybe we’ll see CPUs at smaller nm sizes as well.

We’re getting there slowly.

If I am not mistaken the new 2080 and 2080 Ti both have NVLink, so that has come true already. (2070 does not!)

NVLink

I thought the same thing at first

With RTX cards it is limited.
I don’t know how limited 50Gb vs 150Gb or what exactly is the functional difference, but the full ‘one giant GPU’ benefit seems to be only with the Turing non-consumer cards.
It seems it’s just SLI which, I think just makes a slave/ master GPU and does NOT double to power and memory.
Video below seems to show only GV100s (at $12k a pop) fulfill the NVLink promise.

Maybe NVLink will come to consumer cards in the future. Another reason I think it’s worth waiting for 7nm cards. The 20 series is 12nm and overpriced.

But, you know, this stuff takes a lot of work to parse. Deep Learning online groups don’t want to any discussions of hardware, and hardware groups are mostly gamers.
Maybe there is a mode for RTX cards to work on NVLink differently if you do something weird with a special version of CUDA and only do double single floating half single double precision with Tensorcores enabled.

1 Like

The 2080TIs are experiencing an alarming number of hardware failures, just dying out. Furthermore, the entire RTX line has (at least with the present drivers) just an horrible idle power draw (some 50W against less than 10W of pascal).

I would wait a bit prior to buying one of those. Meanwhile, one can buy the 1080ti for 600 euros, which is not bad.

I’m using 410.66 on my 2080 and it is only doing 12-15 watts as I watch the lesson.

So they fixed that issue in v410. Good :slight_smile: