GTX 2080/2080Ti RTX for Deep Learning?

devforfu · October 22, 2018, 4:33am

Hi Peter!

That’s great! Would really appreciate if you could carry out a benchmark. As I can understand, you definitely should get a speedup with conv nets, and probably the possibility to use bigger mini-batches.

PeterKelly · October 22, 2018, 6:05am

Hi Ilia (is 'Ilia’ok? Or should I call you Devforfu -I am not good at yhis and too old to learn, please advise). Thanks for reply. Yep, the motherboard is also old -PCIe 2.0 - rs, but the new spec (4.0) ones are due in 3-4 months, AND 5.0 spec is about to be released ! So when first quality boards AND micros to suit??
I decided to stick with old motherboard and take each step with it for gpu 2080Ti, nvme, nvmessd, then keap to major expense of motherboard, ram,cpu and more stuff in next $warp.
Perhaps I’m naieve, but at least I can watch!
Cheers & stay cool and faid (new term of trade - FastAI’d),
p

devforfu · October 22, 2018, 1:59pm

Ok, understood!

I didn’t even know that new architectures support half-precision arithmetic, and don’t know how to apply this technique, except using fastai library. Because as can see from the source code, fp16 training is not a very straightforward process. However, an idea to have the card that supports this feature sounds very attractive.

P.S. Yes, sure, Ilia is fine =)

arijun · October 23, 2018, 4:44am

If you’re something that can take advantage of fp16 instead of 32, won’t the 2080 effectively have more memory?

jeremy · October 23, 2018, 5:42am

Probably not, because we need to keep an fp32 copy of the weights too - it’s actually mixed precision training. I haven’t actually tested this however.

devforfu · October 23, 2018, 5:49am

@jeremy Would you advise that it is still worth to choose architecture with fp16 support then 10xx one? I mean, from the perspective of the next 1-2 years? Or is it fine for now to stick with “proven” hardware and pick the previous generation? From your personal point of view, of course =)

Though I think probably my question is a bit too vague to answer right now without having appropriate tests, benchmarks, and actual experiments. Seems that mixed precision training is not a simple-to-answer question.

jeremy · October 23, 2018, 6:26am

I’d get a 2080ti for sure. The tensor cores are well proven now.

init_27 · October 23, 2018, 10:31am

Incase any one else wants a quick refresher on Mixed precision training, I found This thread to be great

digitalspecialists · October 23, 2018, 1:46pm

I managed to get it working after abandoning cuda 10 and moving to a later 410 release. I’m on 410 release working with cuda 9.2 (not 10) and pytorch/fastai dailies, ubuntu 18.04.

I get an error with fit and fit_one_cycle(). (e.g. even on Part1v3 Lesson 1 Pets) cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch-nightly_1540205010643/work/aten/src/THC/THCGeneral.cpp:421 (looks like it is allocating memory?).

Weirdly the workaround is to run just run fit or fit_one_cycle again and it works the second time. Hope that helps someone else.

Has anyone tried the 2080 ti yet and come up with this issue? Or performed comparative performance tests with the 2080 ti under fastai?

FWIW here are some quick benchmarks showing the speed increase vs titan xp and 1080 ti with fastai. I suppose cuda 10 will speed things up further once pytorch officially supports it. Lesson-1-pets Benchmarks - #2 by digitalspecialists

miwojc · October 23, 2018, 2:19pm

not to distract you guys from the discussion, but i found this interesting what you can do with single 1080 and in fact you can win kaggle image competition. congrats to b.e.s. phalanx !

GPU resources

I had only single 1080

phalanx had single 1080Ti and got another one only during the last week of competition

SHAR1 · October 23, 2018, 4:37pm

I am surprised that, it was their first image segmentation problem. They knew nothing about image segmentation 3 months ago. Great !

devforfu · October 23, 2018, 4:57pm

However, I guess they knew at least something about Machine Learning and Data Science competitions before the competition

balnazzar · October 23, 2018, 10:33pm

I wonder if that NVlink is really necessary. I doubt the PCIe bus would be a bottleneck with just two cards.

In that case, and if one plans to leverage parallelism, two 2070 could be cheaper and better than a single 2080ti, in particular when it comes to memory.

EricPB · November 6, 2018, 2:26am

I just placed an order for an Asus RTX 2070 8Go Turbo (blower fan), to install next to my Asus 1080Ti 11Go Turbo with a Ryzen 1700x + Samsung SSD 1To.

So I hope to do some test with Fastai in the coming days, trying the mixed 16/32 precision with its TensorCores (the 1080Ti can’t), most likely using @radek starter pack for the Quick Draw competition on Kaggle.

According to http://on-demand.gputechconf.com/gtc/2018/video/S81012/ (~ at 09 mins), where a lead Nvidia engineer for PyTorch presents a real case study, “using multiple of 8” is critical

balnazzar · November 6, 2018, 1:55pm

Mh, I wonder how many amongst the existing best models does satisfy that requirement.

miwojc · November 6, 2018, 2:35pm

2070 most cost effective as per tim dettmers http://timdettmers.com/2018/11/05/which-gpu-for-deep-learning/

used 1080Ti is about the same price as new 2070. is 2070 still better choice than 1080 ti even for the same price? (pre-owned vs new though).

EricPB · November 6, 2018, 5:28pm

That’s the idea, as Tim favors the RTX 2070 as the “best value GFX for DL, and Kaggle” today.
Even mentioning in the comments that 2* 2070 might be better, while cheaper, than a single 2080Ti for most users as it allows faster exploration of training (pix_size, models, # of epochs, etc.).
Use Ctrl-F + 2070 to zoom into those nuggets.

I’m curious to see how the cheapest TensorCores consumer GFX at €550 compares with the previous “King of the Kill” of the 10xx line-up (I bought mine refurbished for €700 in April 2017).

In Sweden, last copies of 1080Ti’s now retail for €950, while new 2080’s for €900 and 2080Ti’s for €1,300.

EricPB · November 6, 2018, 5:40pm

If you want to activate FP16 with fastai, you add the command to_fp16() when you create your learner, not when you run it.

As in learn = create_cnn(data, models.resnet34, metrics=error_rate).to_fp16()

edit: this was done on a 1080Ti, prior to receiving my RTX 2070. It doesn’t work for the 2070 as it crashes my kernel.

gsg · November 6, 2018, 5:54pm

Are more changes needed to make to_fp16() work ?
I tried:

learn = create_cnn(data, models.resnet34, metrics=accuracy, model_dir='.models').to_fp16()
learn.fit_one_cycle(1)

and it fails with

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

but without the to_fp16() it works fine…

KarlH · November 6, 2018, 6:11pm

Try running the cell again after it errors. If that fails, try running the cell (getting the error), running learn.model.cuda(), then running it again.

I’ve run into several weird cuda/cudnn errors that seem to be solved by just running the cell again. It’s like things don’t work out the first time you try to run a model, but after that everything’s fine.