Comparing the RTX 2060 vs the GTX 1080Ti, using Fastai for Computer Vision

Yes, of course! I do have pascal/volta, but don’t have turing. Together, we can do significant comparisons.

Another thing…: Another set of benchmarks from 2017 seems to confirm what you found for pascal: the speedup in fp16 is not dramatic, but it’s still there.

Specifically:

1 Like

I ran a couple experiments, this time a bit more systematic, using ipyexperiments by @stas.

You’ll find the nbs here: https://github.com/terribilissimo/otherstuff

Other than memory and timings, pay attention to the losses: if one has to do more epochs, the speedup deriving from fp16 is of little use.

Note for @stas: yours is an awesome tool, but it does not seem to work with the teslas. Maybe teh ones I use (dgx station) are a bit different from the ones usually found in cloud instances?

I only tested it with my GTX card. In order to sort out any issues please post the details of what’s not working in this thread: IPyExperiments: Getting the most out of your GPU RAM in jupyter notebook Thank you.

And thank you for your kind words, @balnazzar - I’m glad you find it useful. I think it is still a bit clunky and evolving so any feedback for improvement is welcome.

2 Likes

It would be interesting to compare the new 2060 super with 8gb ram. Not sure if the 2060 super not having NVLink would be an issue for anyone.

I got three of them (blower version), which replaced my previous two 1080ti since Pascal shows issues of convergence when you use it in FP16.

Essentially, they are equivalent to the 2070 (non-super), at a lower price point and TDP. Thus, the 2060S got an amazing price/performance ratio, the best among the cards with 8Gb.
I paid ~1000EU for the three of them (but sold the 1080ti at the same price). With a TDP of 175W, they don’t tax the power supply so much, contrarily to the more power-hungry siblings.

Note that in any task which can be parallelized with DataParallel, you got 24Gb of vram (= titan rtx) for just 1000EU/$, which, together with 16-bit training, allow you to train even big transformers (except for the few biggest). If you motherboard allows you to stack four of them together, that’s even better.

NVLink: the NVLink for the nvidia consumer segment essentially is a toy, much different from the NVLink you would find on titan/quadro/tesla. Forget it, you’ll be fine with the pcie bus, as long as you get at least 8 lanes per card.

2 Likes

That’s good news. I’m considering two non-blower versions with a 2.7 slot width (XC ultra) due to the higher demand and resale of the non-blower cards. I’ve got the airflow and EVGA thinks it’d be a good setup.

If you got the airflow, two non-blowers will be ok and you can resell them easily, particularly if they are evgas.

1 Like