Comparing the RTX 2060 vs the GTX 1080Ti, using Fastai for Computer Vision

balnazzar · March 6, 2019, 1:47pm

Interesting post, @EricPB, and very good article.

Did you check the difference in memory occupation both on the 2060 and on the 1080ti?

Now, I have some considerations about Pascal vs. Volta/Turing in FP16, but it will be better to split them in two: memory and speed.

Speed

First of all, please note that a Pascal card should have 1/32 fp16 performance with respect to fp32. This doesn’t happen, and I’d be curious to know why.
That said, let’s talk about what actually happens. Interestingly enough, different people find very different results.
You found a slowdown of ~5-15%.
I found a slight speedup (see below)
Other people found a substantial speedup: https://hackernoon.com/rtx-2080ti-vs-gtx-1080ti-fastai-mixed-precision-training-comparisons-on-cifar-100-761d8f615d7f

Again, I’d be very glad to know why that does happen.

Memory.

One advantages of volta/turing is that you can almost double your memory thanks to fp16, so a 2060 appears to be on par with a 1080ti even when it comes to memory.
But this remains valid even for Pascal: memory occupation is almost halved on my 1080ti as I train in fp16.

I ran numerous benchmarks in the past, but as I did read your article, I decided to run some additional ones just to have fresh result with fastai 1.0.45 and nvidia apex, which I installed both upon my machine at job (tesla V100), and at home (1080ti).
Mind that I ran the tests upon different datasets since I was in a hurry, but what counts in the end is the net difference between fp16 and fp32 on both cards.

1080ti:

Note that:

between fp32 and fp16 I restarted the kernel and reinstantiated imports, data, etc.
memory occupation in fp32 wa 9127 Mb, and 5081 in fp16
224px images, batch size=256

Tesla V100-DGXS-32Gb:

Note that:

again, between fp32 and fp16 I restarted the kernel and reinstantiated imports, data, etc.
memory occupation was 14883 Mb in fp32, and 7751 Mb in fp16.
we do not observe substantial speedups in fp16 on the Nvidia flagship (incredibly… Did I mess with something?).
this is not a cloud instance. I have direct access to the machine.
700px images, bs=48