GTX 2080/2080Ti RTX for Deep Learning?

I ran the notebook as a Python script (suggested in the PyTorch forum) into the terminal.

The error generated when running to_fp16() is Floating point exception (core dumped)

I’m encountering the same issue with .to_fp16() when running the Cifar10 notebook straight from Fastai github:

Cifar10

One of the moderators on PyTorch forum suggested a cause but it’s beyond my pay-grade :blush:

could it be you are using torch.float data somewhere in torch.half layers?
Could you post your model definition?

Can anyone help to answer ? :heart_eyes:

cc @sgugger

In the meantime, a quick benchmark of Cifar10 for the RTX 2070 vs the 1080Ti, in FP32.

The 2070 is achieving almost identical speed (52sec vs 49sec per epoch) with a batch size more than half smaller !?!
Am I doing something wrong ?

GTX 1080Ti 11Gb

RTX 2070 8Gb

When I run your code on the 2080, learn.fir_one_cycle gives me the usual error the first time but works fine the second go - no kernel restart.

RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch-nightly_1539945974892/work/aten/src/THC/THCGeneral.cpp:421

I can run as high as BS = 184 with a 45 sec average epoch.
Adding to_fp16 drops the time to 40 seconds.
Increasing BS to 368 drops time to 38.

1 Like

I was expecting something more than 10% :thinking:

1 Like

Unfreeze the model, run some more epochs, then report back.

As the model is frozen, the GPU is not leveraged properly, no matter how powerful it is.

I can add I get more or less the same timings, on a frozen model, with 1070, 1080ti, and a Tesla V100.

1 Like

I think this repo is dedicated to explore mixed-precision with PyTorch.

I’m running the scripts with the World Language Model, and can see a slight performance boost (+15%) with --fp16 on the 2070.

Hi Ilia, sorry for the tardy reply. That’s a great idea, although, as you suspect, I doubt original. I offer a twist … assemble a few ‘small’ test sets of data to analyse. Place emphasis on GPU performance, NOT analytucal accuracy, but time taken for each epoc. This way we can compare horrid little machines to real grunt boxes without having to wait days or weeks for analytucal results. Hence a broad spectrum of performances can be readily comparde and duplicared by anyine wishing to test their ‘specual rigs’ (please pardon gamer parlance). What do you think Ilia, or anyone else? All suggestions welcome, no offence will be taken; I’m too old for that.
I’m flat out busy on distractions until the end of the month. Then December to catch up, then into FasrAI with ears pinned back.
Cheers Ilia, Peter Kelly.

As a follow-up to my setup failing to run .to_fp16() with fastai on a RTX 2070, I found this ticket on PyTorch GH where they identified the cause for the bug I have (when running the script as “.py” in terminal, it generates the same error message “Floating point exception (core dumped)” ).

TLDR: it may be caused by a bug in CUdnn 7.1.4, “confirmed by Nvidia”, and was fixed by either reverting to 7.1.2 or upgrading to 7.2 (now 7.4 is available).

The tricky part, for me as a noob, is to find the way to get out of the 7.1.4 space.
When I check the installation procedure from Fastai (conda install -c pytorch -c fastai fastai pytorch-nightly cuda92), the current version of PyTorch 1.0 comes out as bundled with CUdnn 7.1.4.

My question: for those of you successfully using an RTX card with fastaiV1 and mixed-precision, what is the version of CUdnn & CUDA installed on your system ?

BR

PS: I’m using a brand new installation of Ubuntu 16.04 dedicated to FastaiV1.
I built it right at the start of this course, and once Ubuntu was done, I went for the procedure described on Fastai GitHub.


That is:

conda install -c pytorch pytorch-nightly cuda92
conda install -c fastai torchvision-nightly
conda install -c fastai fastai
1 Like

I have 7.2.1

Try ‘conda install cudnn’ and see if that takes care of the issue.

As an aside, I built a new LXC container from scratch today and it ran fine, and didn’t even give me the usual cuda error 11. No improvement in speed though. Shoutout to @willismar for the excellent LXC tutorial.


Checking ‘conda list --explicit’ on the LXC container, I have
pytorch-nightly 1.0.0.dev20181109-py3.7_cuda9.2.148_cudnn7.1.4_0
but no cudnn listed individually. My main box has
pytorch-nightly 1.0.0.dev20181019 py3.7_cuda9.0.176_cudnn7.1.2_0 and cudnn 7.2.1

2 Likes

As an update, I fixed the issue of .to_fp16() crashes by installing the cuda9.0 (with CUdnn 7.1.2) version of PyTorch instead of the cuda9.2 (with CUdnn 7.1.4) version.

So doing:
conda install pytorch-nightly -c pytorch instead of
conda install pytorch-nightly cuda92 -c pytorch.

Click on “Preview” tab to see the PyTorch 1.0 builds.

1 Like

I have a question for people who got the 2080 TI: where did you get them? Seems like I missed the window for now, because it’s impossible to find them online. Also if you have one: did you get founders edition or a different manufacturer?

It is in stock here in the UK from nvidia shipping in 1-3 days. There are various stock trackers online eg ‘nowinstock’ and the nvidia forum might have early notice.

1 Like

Hi Eric,

I upgraded drivers to 410.72. I now face exceptions at the end of training


so I am not in a position to provide you any tips on to_fp16 usage :slight_smile:

BR,
Julien

1 Like

Hi
I am getting error on this line in lesson 1:
interp = ClassificationInterpretation.from_learner(learn)

Floating point exception (core dumped)
Training is working fine but interpretation is giving error

1 Like

I tried that a couple of years ago with 1060 instead of an SLI newer card. Never bought the second due to the 1060 being so slow. Kind of aiming for one, great, card now. All I need is to rob a liquor store to afford it.

1 Like

similarly i am getting error now in
learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))

after epoch the process stop with same floating point exception

Another TensorFlow benchmark by Puget comparing the whole RTX family (2070, 2080, 2080Ti, dual-2080Ti) vs. the 1080Ti.

The 2070 is in a really sweet spot, compared to both 2080 and 1080Ti perfs.
Price-wise, the 2080 is a bit of “Meh…”

4 Likes

FWIW, I ran Fastai’s Cifar10 notebook on a Palit RTX 2080Ti-11G in Fp32 and Fp16.

The Fp32 with batch_size=512 shows a strong improvement over the 1080Ti-11G already: 35 sec per epoch vs 49 sec.

For the Fp16/mixed-precision, I failed to get anywhere close to Nvidia claims of " up to 2X faster than Fp32": about 31 sec per epoch with a batch_size=248.
I couldn’t go beyond bs=248 without either a CUDA or cuDNN error, which is counter-intuitive vs bs=512 in Fp32.
Note: I’m using the regular PyTorch 1.0 install package, didn’t install anything from source nor updated CUDA/cuDNN versions.

My notebooks: https://github.com/EricPerbos/GTX-vs-RTX-Deep-Learning-benchmarks

My guess is: once PyTorch and TensorFlow release their optimised and stable versions for the latest CUDA/cuDNN in a few months, the RTX 2070-8G & 2080Ti-11G will be fantastic tools.

We’re just not there yet when it comes to “Plug & Play” :sunglasses:

4 Likes

Tried your notebooks on my 2080 - 13% faster at fp16 than fp32 using BS 256. FP32 could do BS 400 but not 440. FP16 could do BS 800 but not 840, so it did show ~double the capacity. I’m on the 415 driver, but otherwise just the standard install on ubuntu 18.04.