Buying new GPUs right now

balnazzar · June 20, 2018, 2:50pm

Your answers have been quite informative.

Thank you!

tenoke · June 20, 2018, 7:05pm

You need to find where your bottleneck is first. Look at your RAM, CPU and I/O from harddrive. You are most likely hitting the limits with one of those in your code, and you can either optimize your code (now and in the future) to avoid the bottleneck, and/or you can upgrade whatever piece is slowing you down.

If you are not hitting 100% on either of those, then it might be something more esoteric (like if you’ve connected your GPU via a small number of PCIe lanes, so it can receive data only so fast), but I doubt it.

balnazzar · June 20, 2018, 7:40pm

I have 64Gb of ram, and a fast CPU.
The GPUs are both connected through sixteen gen3 lanes.
The hard drive is a samsung 850 Pro, connected to a sata 3 port.

Last but not least, I’m testing the system using vanilla fastai notebooks, precisely for avoiding personal errors in programming.

I’m beginning to be fairly convinced that training just the last few layers is not enough to push the gpu to its limits.

Further details (on a slightly modified nb, but I used the vanilla ones, too):

tenoke · June 20, 2018, 7:54pm

There is also drivers, and your software stack (versions of everything to consider). At any rate I would actually check the performance of everything (are you e.g. hitting 100% on 1 core while the rest are mainly idle).

At any rate, your conclusions aren’t quite correct

A GPU is not fully leveraged unless you train, retrain, or fine-tune a big model in all its layers (or at least a good part of them).
In turn, this does mean you have room for improvement over the other components.

I think that when we have just one layer or two to train (and/or when precompute=True), quite a lot of time is spent moving minibatches back and forth.

This might be true, if you make no changes, but you can definitely leverage a full GPU on a single layer (depending on the model, possibly not here in this case), and as mentioned before you can increase the size of the minibatches.

balnazzar · June 20, 2018, 7:59pm

Uhm, for the example I wrote about, I selected the maximum batch size allowed by VRAM (128) exactly for that reason.

I played with other vanilla nbs, and various models (from vgg to resnext with various depths), I varied a lot hyperparams, never getting the gpu to be fully leveraged unless the network was unfrozen.

Any further suggestion would be greatly welcome!

Walkir · August 16, 2018, 10:32am

Does anyone have idea how new nVidia cards are going to work for DL?
There are quite a lot of leaks already about them…

balnazzar · August 16, 2018, 11:31am

https://medium.com/syncedreview/revolutionising-graphics-nvidia-unveils-turing-architecture-with-real-time-ray-tracing-d32513c1c891

Best guess is that they’ll include the tensor cores from Volta, at a much more affordable price point.

If so, DL libraries (including fastai) will start to support tensor cores, which is quite a good thing: a 10x speedup won’t go amiss.

digitalspecialists · August 16, 2018, 12:16pm

Not long to wait to find out - seems a solid likelihood new kit will be announced Aug 20th.

sahil_2015ti · September 5, 2018, 10:37am

They are quite promising, infact many system integrators have already started providing DL systems with RTX series GPUs. Brands like boxx.com and ant-pc.com have systems for ML, DL and AI with these new GPUs.

In a nutshell these new GPUs are upgrade to previous generation there they will perform better however I do expect driver issues initially but as the updates pour in they will become more relaible.