GTX 2080/2080Ti RTX for Deep Learning?

I wonder if that NVlink is really necessary. I doubt the PCIe bus would be a bottleneck with just two cards.

In that case, and if one plans to leverage parallelism, two 2070 could be cheaper and better than a single 2080ti, in particular when it comes to memory.

I just placed an order for an Asus RTX 2070 8Go Turbo (blower fan), to install next to my Asus 1080Ti 11Go Turbo with a Ryzen 1700x + Samsung SSD 1To.

So I hope to do some test with Fastai in the coming days, trying the mixed 16/32 precision with its TensorCores (the 1080Ti can’t), most likely using @radek starter pack for the Quick Draw competition on Kaggle.

According to http://on-demand.gputechconf.com/gtc/2018/video/S81012/ (~ at 09 mins), where a lead Nvidia engineer for PyTorch presents a real case study, “using multiple of 8” is critical :sunglasses:

4 Likes

Mh, I wonder how many amongst the existing best models does satisfy that requirement.

2 Likes

2070 most cost effective as per tim dettmers http://timdettmers.com/2018/11/05/which-gpu-for-deep-learning/

used 1080Ti is about the same price as new 2070. is 2070 still better choice than 1080 ti even for the same price? (pre-owned vs new though).

1 Like

That’s the idea, as Tim favors the RTX 2070 as the “best value GFX for DL, and Kaggle” today.
Even mentioning in the comments that 2* 2070 might be better, while cheaper, than a single 2080Ti for most users as it allows faster exploration of training (pix_size, models, # of epochs, etc.).
Use Ctrl-F + 2070 to zoom into those nuggets.

I’m curious to see how the cheapest TensorCores consumer GFX at €550 compares with the previous “King of the Kill” of the 10xx line-up (I bought mine refurbished for €700 in April 2017).

In Sweden, last copies of 1080Ti’s now retail for €950, while new 2080’s for €900 and 2080Ti’s for €1,300.

If you want to activate FP16 with fastai, you add the command to_fp16() when you create your learner, not when you run it.

As in learn = create_cnn(data, models.resnet34, metrics=error_rate).to_fp16()

edit: this was done on a 1080Ti, prior to receiving my RTX 2070. It doesn’t work for the 2070 as it crashes my kernel.

1 Like

Are more changes needed to make to_fp16() work ?
I tried:

learn = create_cnn(data, models.resnet34, metrics=accuracy, model_dir='.models').to_fp16()
learn.fit_one_cycle(1)

and it fails with

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

but without the to_fp16() it works fine…

Try running the cell again after it errors. If that fails, try running the cell (getting the error), running learn.model.cuda(), then running it again.

I’ve run into several weird cuda/cudnn errors that seem to be solved by just running the cell again. It’s like things don’t work out the first time you try to run a model, but after that everything’s fine.

I was using learn = to_fp16(…) rather than (…).to_fp16. See if that works for you.

I’m on the 410 driver as well. I usually get a CUDA error 11 the first time I ‘learn.fit_one_cycle(1)’ but it works without a hitch when I re-run the cell.

Hi ilia, with my tail between my hind legs and head bowed in shame I must confess that I have rendered my ‘upgrade’ and planned successive performance steps USULESS, for now. Suffice to say that I can’t even boot from my poor old ( formerly) trusty tower anymore. I do have an Asus G75V with 2T of ssd and have been doing The Master’s (Jeremy’s) course on it. Believe me Ilia, I am working on it; I am even reading the installation instructions for the Samsung 2T vnand 860 pro and the m.2 970 evo. The Gigabyte 2080 Ti oc looks GREAT, good for photos. When I do get all working, AI will probably still be popular.
Is there a general thread where I can sing Jeremy’s (and Rachael’s) praises? I think thus should be done from the global AI rooftops. He , virtually alone, has revived AI to a fabulous state, translated it from ‘nerd-speak’, worked like a deamon in proving himself and his methods (non-confirmist, and for good reason), wrestled the AI from the strict realms of the academics, read heaps of papers and sorted them, cut through the in-breed jargon, prepared superbly understandable lessons and videos into THE premier AI course and presents it FREE for us unclean masses in the fabulous MOOC. WHAT A GREAT GUY! And, of course, he is Australian, AND from Melbourne. Thanks also to the San Francisco University for supporting him in this Globally Disruptive Technological thrust. This will inspire global changes to many things, you ain’t seen nothin’ yet.
I’ll try to keep you informed if any significant progress on my little desktop Ilia.
Cheers for now,
Peter Kelly

I just installed the RTX 2070, it works fine in standard precision but when I try the .to_fp16(), or “leanr= to_fp16(create_cnn(…))”, it crashes my kernel without an error message in Jupyter Notebook.
So rather hard to debug :slight_smile:

I’m using Ubuntu 16.04 and nvidia 410.73.

Strangely the .to_fp16() command works with my 1080Ti without crashing the kernel (but no performance boost), thus my post yesterday https://forums.fast.ai/t/gtx-2080-2080ti-rtx-for-deep-learning/26783/60?u=ericpb

I am also getting

The kernel appears to have died. It will restart automatically.

when I try the `“leanr= to_fp16(create_cnn(…))”,
using ubuntu and nvidia 396.44 on a V100…

1 Like

It’s not, in my opinion.

Less powerful, less memory, troubles with .to_fp16()

Once they address such problems, a couple of 2070 for 1 grand will be great: one card with 16gb, and more power than the 1080ti (maybe even than the 2080ti).

That’s very much spot on: for €550-600 one can get either a used 1080Ti 11go (last-gen best in class) or a new RTX 2070 8go.
The key difference is access to Tensor Cores + FP16 (or mixed precision) with the 2070, potentially doubling its VRAM or at least on-par with the 1080Ti.

Getting the RTX (whatever model ?) to activate its Tensor Cores for either PyTorch or TensorFlow is not yet a simple “Plug & Play” as I discover myself, making a simple benchmark such as cifar10 (thank you @sgugger :hugs:) quite challenging.

BTW I’d love to hear from the recent owners of 2080/2080Ti if/how they managed to run Fastai in mixed precision, and how it compares to standard FP32 training.

2 Likes

I just tried using to_fp16() with a 2080Ti and didn’t face any crash. BTW I don’t see any performance improvement on this simple test (classifier with 1k images)

I ran a bunch of benchmarks on the Pets notebook, someone’s modified version of it, using my 2080.

|RN|FP|BS|SZ|Time|Error|
|34|16|100|224|4_47|0.053|
|34|32|100|224|5_16|0.055|
|34|32| 48|320|7_29|0.053|
|50|32| 32|320|9_19|0.044|
|50|16| 32|320|8_40|0.043|
|50|16| 32|299|8_07|0.044|
|50|16| 64|299|7_20|0.041|

I think I have some bottleneck as another posted similar times on a 1060, but doubling the batch size was as simple as wrapping the learner in to_fp16(…).

1 Like

I don’t like used cards. Often they come clogged with dust and dirt, and getting them clean is not an easy task. I’d say buy a new 1080ti, or wait until prices drop and go for 2 2070.

Hej JulienM,

Thank you for replying as a RTX 2080Ti owner, that helps a lot !

So you didn’t face any crash while using to_fp16(), even though you didn’t see any performance improvement (secondary question: metrics or speed/duration ?).

Could you share more info about your experience: with an official Fastai mooc notebook, a Kaggle kernel or else ?
So I can try to replicate your exact code while using the RTX 2070.

BR,

EPB

Hej Ralph !

Thank you for providing more info.

When you ran a bunch of benchmarks on the Pets notebook: did you activate the TensorCores and/or FP16/MixedPrecision in any specific way ?
I couldn’t find a specific TensorCores/FP16 code in the version you linked.

BR,

EPB

All I did was switch between:

#learn = to_fp16(create_cnn(data, models.resnet34, metrics=error_rate))
learn = (create_cnn(data, models.resnet34, metrics=error_rate))

The linked notebook has more epochs than the original, which explains the slower times in the benchmarks.


Going back to the original notebook, the change above increases my max batch size from 192 to 375,


I’m running cuda 9.2 and nvidia 410.73, but 410.66 also worked.

1 Like