Recommendations on new 2 x RTX 3090 setup

balnazzar · January 2, 2021, 8:38pm

TBH, I don’t have the faintest. Would it depend upon the task? Once more, no idea.

I’ve read contrasting opinions about that, but we could take a couple things for certain: 1. typical deep learning workflows do require MUCH more bandwidth than gaming, and 2. the quicker your cards are at processing stuff, the more you would need fast interconnection.

If any of you owns a pcie 4.0 system and at least two ampere-class cards, there is a quick way to investigate these aspects. Run a beefy benchmark (mlperf, etc…). Then go into the BIOS, limit the bandwidth allocated for the slots, and benchmark again.

It’s very quick and simple and it would provide us a lot of useful information to guide our hardware choices, also answering questions that DL practitioners keep asking (do I need nvlink, etc…)… Still, it seems that people who run labs and are given the cards for free by Nvidia (exxact, puget, lambda, etc), are just capable of stacking cards, running some ResNet50 and publish the results brainlessly.

balnazzar · January 2, 2021, 9:40pm

We can attempt a theoretical analysis, but please take it with a grain of salt.

When using multiple GPUs, one can leverage them in two very different ways: data parallelism and model parallelism.

The first one is the more common, by far, since it’s much simpler: it works with gradient accumulation. That is, each gpu has its own minibatches, while the model resides upon the leading gpu.
In this case, communication between gpus is minimal, and what really matters is the bandwidth for each single gpu with the cpu.

In the second case, the neural network is distributed across the gpus, so we can consider that like a sort of ‘true’ parallelism, and indeed one has to engineer the model accordingly. In such a case, communication bandwidth across the GPUs is more important (and tools like NVlink are indeed made for this specific case).

We have no DL-specific benchmark for neither of the two cases, but please note that in the analysis that Puget ran with two ancient Titan X in SLI, there is at least one case showing huge differences:

While being a gaming graphics engine, that reveals there are situations in which slot bandwidth is fundamental. The cards were connected in SLI, but note that SLI is no NVlink, and it’s just used to exchange synchronization signals between the cards.

shreeyak · January 6, 2021, 12:31am

@balnazzar I’m willing to run some tests this weekend. I’ve got three 3090 blowers and 2 systems with PCIE GEN 4 and GEN 3 respectively.

If you can outline what benchmarks to run, I can set them up, provided they don’t take too long to run.
I’m thinking of the following experiments:

Gen4 vs Gen3, single card 16x
Gen4 16x/16x vs 16x/8x

PS: Threadripper goes to 16x/8x/16x/8x with more than 2 cards populated. I couldn’t fit a 4th GPU on my mobo though, because the front panel connectors block it.

balnazzar · January 6, 2021, 9:37am

Thanks @shreeyak. I assume you got the Gigabyte cards, right? It will be interesting. Please also monitor the temperatures.

If you want to run a very quick but informative test, here is a convenient option: https://github.com/eugeneware/benchmark-transformers

You can also compare with the author, since he also got a blower 3090.

Just set the correct batch size in the config file. I think 3-5 epochs will suffice. I’d do the following experiments.

Use two cards, at full X16 4.0… Then lower them to gen 3 (in the BIOS), but still x16.
Set them a gen4, but x8/x8, and observe the difference, if any.
Run the three of them at x16/x16/x8 gen4, and then x8/x8/x8.
Experiment with a nvlink bridge and 2 cards, if you have one.

If you need a fourth card, you can easily add it using a PCIe cable extender, provided that your case has sufficient space.

Last but not least, if you want to test you setup thermally, use gpu-burn, and let it run for some 10 minutes. Don’t worry, you won’t burn your GPUs, despite the ominous name.

Thanks!

shreeyak · January 11, 2021, 2:18pm

Yes, I have the Gigabyte Turbo. Very heavy cards, got a solid heatsink. The blower edition is surprisingly cool. Rarely crosses 80C on default fan curve. They’re really loud though (much louder than 1080ti FE on all default settings). Coolest GPU sits at 67C.
I’ll try the gpu-burn to get some solid numbers to share.

I had some work come up this weekend, will run the tests next weekend.
If we just want to see the throughput from training, I’ll use my own repo for semantic segmentation. Unless you want to see transformers in particular?

I don’t have an nvlink. The other 3 experiments sound good. So we’ll basically get diff b/w gen3/gen4 (x16) and x16/x8 (on gen4).

balnazzar · January 11, 2021, 2:43pm

Thanks, @shreeyak.

I ordered two of these after evaluating all the available options. Yeah I think they are heavy since their hetsink is pure copper as opposed to normal aluminium. Surprisingly, they are short (less than 260mm I think, while my 2060S blower are 280mm long). May I ask you why you went for these cards and not for open-air models or FEs (much more silent)?

As for the benchs, I suggested that transformer mainly because we got a reference benchmark from the author with the same card… But if it’s too much hassle, I think your repo will do

Perhaps even more interesting will be some 10-minutes run with gpu-burn, since we’ll see how they behave thermally and acoustically when fully loaded. My cards will be delivered on friday, so we’ll run some comparative benchmark.

Thanks!

florianl · January 12, 2021, 11:16pm

what power supply do you currently use? I have a 1250 watt psu and wonder wether it is suffice for two / maybe three 3090.

balnazzar · January 13, 2021, 9:51pm

Got the first card. Remarkably compact and manageable for a 350W monster. Heavy, but less than the FE.

VERY good thermals. Max temp, 10min gou_burn, ambient 21C, in a crappy 60eur case:
68C, 350W (over 17 Tflops)
63C, 300W
57C, 250W (still delivering almost 15 Tflops)

But hell… It’s noisy. Don’t buy that card if you have sensitive eardrums (two of them are gonna be even more noisy), or buy a soundproofed case.

On the other hand, at idle it sits at 28C and it’s de facto inaudible.

shreeyak · January 14, 2021, 2:32pm

I have a Corsair AX 1600i. 1250W would be enough for 2x 3090’s, but not for 3x. I got my numbers from Pudget’s reviews. The actual wattage can spike up beyond the rated 350W.

shreeyak · January 14, 2021, 2:59pm

@balnazzar Sweet! Love your case, looks neat.
Yeah, the card is louder than the 1080ti. With 3 cards running at full load… I could hear the fans in the next room. If you’re sensitive, you won’t be able to stand it. And given the kind of heat it outputs, you definitely should consider putting it in a separate room. It’s winter here and the system manages to heat up my entire living room, despite it being well ventilated.

I wanted 4 cards in a single system, had to settle for 3 with the 3090’s power draw. It was uncertain how the FE models would perform, if I ever managed to find a case that would fit three of them. I read Pudget’s reviews of these Gigabyte Turbo cards and it was clear that three of these blower cards work well, so I went for it. I’m very happy with my setup

Performed gpu-burn for 10mins. My top GPU hit 86C. Surprisingly, the middle is the coolest at 71C. I don’t know how that can be, lol.

Here’s my setup:

balnazzar · January 14, 2021, 4:10pm

It’s a very good setup!

But the upper gpu is too hot. Which is strange, given that puget got just 80C in a 4-gpu setting. And you have A LOT of fans. Try to investigate the reason of that excessive temperature.

Maybe it would be worth running the same 10-min gpu_burn with the cards at 280/300W.

shreeyak · January 14, 2021, 7:52pm

Thank you! I found the upper gpu’s temp to be odd too. I have no clue why. One factor is that I don’t have fans at the front of the case, it’s a side-intake. Could it be heat from the CPU/RAM? Do you have suggestions for tests I can do?
When I try the x16/x8 PCIE lanes test, I’ll try switching around the GPUs and see if it might be a problem with that card in particular.

I just found out that my GPU1 is actually the bottom GPU (which is why it’s coolest). GPU0 - top, GPU1 - bottom, GPU2 - middle. Just a note, my gpu idle temps are higher than yours at 40-45C, with ambient temp close to 25C.

Here’s a gpu-burn at other power limits.

I’m not seeing anything close to 17TFs! More like 15, likely due to high temps.
Lowering power limits made only a tiny difference to temps
Even though gpu0 is hotter, it has more TFs than gpu2. There’s definitely a silicon lottery involved.

# 350W - 15.3 TFlops
100.0%  proc'd: 552108 (15746 Gflop/s) - 562526 (16063 Gflop/s) - 539865 (15342 Gflop/s)   errors: 0 - 0 - 0   temps: 86 C - 71 C - 79 C

# 300W - 14.7 TFlops
100.0%  proc'd: 520332 (14828 Gflop/s) - 539865 (15401 Gflop/s) - 518537 (14732 Gflop/s)   errors: 0 - 0 - 0   temps: 82 C - 69 C - 77 C

# 280W - 14.2 TFlops
100.0%  proc'd: 497824 (14227 Gflop/s) - 518537 (14830 Gflop/s) - 495876 (14201 Gflop/s)   errors: 0 - 0 - 0   temps: 80 C - 68 C - 75 C

# 250W - 12.6 TFlops
100.0%  proc'd: 460752 (13044 Gflop/s) - 478547 (13724 Gflop/s) - 447888 (12607 Gflop/s)   errors: 0 - 0 - 0   temps: 77 C - 65 C - 72 C

balnazzar · January 14, 2021, 8:22pm

Swap gpu0 and gpu1. Maybe gpu0 has been a bit unlucky in the silicon lottery. In the lowermost position, it will take the airflow from the floor fans!

Too much delta… Let me think about it, but case airflow and 3-4C of ambient difference cannot justify it…

Another clue that something is not perfectly right…

shreeyak · January 14, 2021, 10:53pm

I’m using the default fan curves. At idle, my fans are at 30-35% (per nvidia-smi). What are your fan speeds?
I could try opening up the case to verify that it’s not an airflow issue.

balnazzar · January 15, 2021, 10:05am

30% always, at idle. You got a lot of positive pressure with all these fans, regardless of the airflow direction. A whole lot of air is forced to exit the case through the gpus, as would happen in a server case with passively cooled tesla cards, and this is a good thing. So your issue is even harder to understand.
You should really swap those cards, so that if one of them is somewhat defective, you can RMA it and ask for a replacement. If they say no, just return it for refund, and buy another.

shreeyak · January 15, 2021, 7:12pm

You got a lot of positive pressure with all these fans

Good point.

just return it for refund

Lol. There’s no returning of electronics for refund where I’m at. Some things bought at Amazon, yes. But otherwise if purchased from a shop… no returns. Unless completely dead. Sometimes.

I’m glad I ran those tests. Will def spend some time this weekend debugging it. Overall though, doesn’t seem too big of an issue. Is it? Even if the temps are higher, the card seems to be performing well. And they should be able to handle constant temps of 85C (1080ti runs at that temp default).

redturtle · January 16, 2021, 3:18am

Nice setup. A suggestion re: GPU placement – if you have a 4th PCIe slot on your motherboard below the three GPUs, you will get better airflow by moving the 2nd GPU into the 4th position.

Visually in your current setup only the bottom GPU gets “fresh” air:
GPU
GPU
GPU
EMPTY SPACE

You’ll get better temps with this arrangement, where the top GPU and bottom GPU will both get “fresh” air:
GPU
EMPTY SPACE
GPU
GPU

I think that’s an O11XL case ? It should be able to handle having a GPU in the lowest position and the bottom case fans will help with cooling.

balnazzar · January 16, 2021, 2:25pm

Both cards run flawlessly at that temp since they thermally throttle. If you are happy with a slight thermal throttling, there is no problem. The card won’t be damaged by such temperatures. Otherwise you can opt for a waterblock. There was a guy on reddit that fitted a WB on a 3090 turbo (search for it if you are interested). And of course, there always is the power limiter.

Finally, it could just be that the chip has not been correctly pasted in the factory. You could try to repaste it. Also check if the screws holding the heatsink in place got more or less the same tightening torque.

In any case, yours is a very effective rig

balnazzar · January 16, 2021, 2:26pm

Agreed. Albeit a bit aesthetically unpleasant.

redturtle · January 16, 2021, 3:22pm

Two additional suggestions:

Consider power limiting the GPUs. There is a nice writeup from Dr. Kinghorn below. I haven’t tried this myself, but others have reported good results with only minimal decreases in performance.
https://www.pugetsystems.com/labs/hpc/Quad-RTX3090-GPU-Power-Limiting-with-Systemd-and-Nvidia-smi-1983/
The default fan curve is not sufficient to cool these GPUs in a side-by-side configuration. You should set a new fan curve or manually set the fan speed. Most of the instructions for how to do this are for older versions of Ubuntu using X11 and coolbits = 4; for Ubuntu 20.04 you can instead use GDM. Here are instructions.
http://bailiwick.io/2019/09/21/controlling-nvidia-gpu-fans-on-a-headless-ubuntu-system/