Recommendations on new 2 x RTX 3090 setup

That’s sure as hell. You have to disassemble the whole card.

I just ordered the waterblocks since I didn’t want to risk them being out of stock in the future. It’s a fallback measure in case the noise starts to annoy me for real.
I didn’t order the pump, radiators, & stuff yet.
But if and when I decide to do it, I’ll let you know :wink:

Do put some passive, low profile heatsink in correspondence of the memory chips on the backside.
The mem junction temp for the turbo typically stays below the throttling temp, but none the less I don’t like them running at ~95C.

Note that watercooling could ameliorate that issue up to a substantial extent. If the “main” side of the card stays below 60C, the backside has to follow closely, as per basic thermodynamical considerations.

Here are the pictures of a 3090 turbo teardown. Use them as guide for placing the heatsinks.

It is not clear to me, though, if we need a super recent CPU to avoid bottlenecks when feeding 2 x 3080Ti. My current system is based on an Intel Core i7-8700K 3.7 GHz 6-Core Processor. So far, it has served me well. Do you think I’ll have to upgrade to AMD Threadripper? Nvidia is pushing to move preprocessing to the GPU, although this is not mainstream yet.

To answer my old question, i7-8700K is fully capable of feeding one RTX 3090 without any problem but cannot keep one RTX 3090 and one GTX 1080 running simultaneously without causing either one of them (or both) to frequently decrease GPU utilization. This is particularly evident when using small batch and image sizes (128x128) on the GTX 1080, which is likely the use case for most practitioners. FYI, I’m using Albumentations + Pillow + SIMD + libjpeg-turbo to take care of the augmentation process.

Those of you having 2x3090, are you all using Threadripper?

2 Likes

I have two 3090 Turbo and use a Xeon 8260M. No slowdowns noticed.

Try using Nvidia DALI for your preprocessing. Mind that if you want to upgrade, many modern processors can preprocess data and feed two GPUs without resorting to a threadripper. The easiest options are the Ryzen 5900x and 5950x with a B550 board.

And no wonder that they whacked the Turbo. As highlighted by Tom, it’s at odds with A100, A40 and A6000 by costing a fraction of the price.

1 Like

By the way ppl, Im still getting that strange error with same notebook as before https://walkwithfastai.com/Style_Transfer

If you have this specs an you try to run it?

Nvidia driver version: 460.32.03
CUDA used to build PyTorch: 11.1
OS: Ubuntu 20.04.2 LTS (x86_64)
Python version: 3.8 (64-bit runtime)
GPU models and configuration: GPU 0: GeForce RTX 3090

I think the pytorch version could be PyTorch version: 1.8.0a0+05c8cd7 or 1.7 1.8 is the one causing me the restart, 1.7 should output that the pytorch version is not compatible with 3090 sm architecture.


And by the way, how are you handling this if you are on Ubuntu? that is for pytorch 1.7.1 and latest Driver Version: 460.32.03 CUDA Version: 11.2

well, finally could run as my computer configured with pytorch 1.9 and latest nvidia drivers for Ubuntu.

The trick for the moment is this sudo nvidia-smi -lgc 210,1800.

So I guess for the mean time that should be good enough.

1 Like

I’m not following… Would you please elaborate? I thought it was just a matter of drivers & pytorch’s version.

Yeah, latest Ubuntu drivers show this

Which in turn starting to show this

At start of year, it was other drivers but this update in itself and you dont need to reinstall drivers if you select it like this, the bad thing is that they removed “older approved drivers”… which I think follows some changes in the kernel (so anterior drivers may not be compatible with 5.8.0-43-generic this is just something “I think”, that would be the “why” the previous options are not displayed anymore).

So later I found that sm_86 will only be available on pytorch 1.9, so I compiled and eventually got this restart in some notebooks. I have also reinstalled drivers, cuda toolkit and cudnn to match 11.2 and finally I asked and got this workaround which limits GPU usage I guess?.

@balnazzar given the solution sudo nvidia-smi -lgc 210,1800 do you think it still could be the cables or the PSU in itself? By the way, I must correct myself, it is a Corsair HX1000i.

The HX1000i is perfectly capable of feeding a computer, even a TR, with one single 3090.

And by the way the advice you got to switch it to single rail is a bad advice (ask if you want to know why). Conversely, you should always connect the two 8-pin to two different psu outlet.

I don’t think it’s a max clock problem, unless you got a very badly binned specimen. And anyway, just using the power limiter would allow you to experiment and troubleshoot much more easily.

Did you make sure you connected the two cables to different outlets? And if so, do you still get reboots?
If yes, one possible check worth making (but I’m just blind-guessing here) is repadding the backplate. If just one pad is out of place, the vram will get super hot and unable to cool down by throttling…

Finally, with a corsair digital PSU, you can switch single/multi just by using their iCue software as well as for checking power draw and voltages.

Yeah, want to know that, I have seen a video that most probably current PSU all are multi-rail or maybe it was the other way around?

I think that only works for WIndows unfortunately.

WIll check the other specific things, but probably with results in about 2 weeks (taking a break from home).

Almost all of them are single rail.
The good thing about single is that you don’t have to worry about the power draw of a single component, as long as your pc remains under the rated total psu wattage.
The bad thing is that you can overload a single outlet or a group of them and if the OCP doesn’t engage correctly, you can damage components or even start a fire.
That’s very unlikely to happen with multi, especially if the manufacturer has correctly splitted them in terms of wattage.

Yep, the thing about switching single-multi is supported just by iCue on windows. But reading the PSU informations works in linux too… There should be a driver for these PSUs, then you read the feed by lm-sensors.

1 Like

ALL turbo/blower models are now discontinued. It seems Nvidia was very pissed off…

Hello!
I have several questions about buying new DL station. At work we are using 6 years old 4x 1080Ti station and it is slowly dying :slight_smile:
So my question is where can we buy 2x 3090 workstation from vendors like https://www.exxactcorp.com/Deep-Learning-NVIDIA-GPU-Workstations or lambdalabs - maybe you know some alternatives.

I read all 253 posts, and found out that

  1. AMD Threadripper can be overkill for 2 GPUs and sometimes Ryzen 9 can be better. So my question - what CPU to choose?

  2. GPUs are loud, but it is not a problem, we are going to set it up in server farm.

  3. Additional question: What SSD to buy? I saw a lot of promotions for Samsung evo series, are they really better than competitors? Is there any speed difference in 2 TB and 4 TB versions?

1 Like

There is Bizon too. But all those workstation, while solid, are badly overpriced (understandably, they need their margin).

Any r5000 or r3000 will do just fine. The number of cores depends upon your budget. The more, the better. Or a second-hand xeon. I use a 8260M QS. The big point here is that you can expand your RAM well over the 128gb limit of a desktop part, and even over the 256gb limit of a TR.

It depends. Do you want desktop-grade or server-grade hardware? I use the Samsung PM1735 due to endurance (almost 10x a desktop part), since I continuously move big datasets in and out from spinning drives. If you don’t tax them badly, a desktop drive will do just fine. The PM1735, on the other hand, has warranty for 5 years of usage at 3 whole disk writes per day (this amounts to over five thousands whole disk writes). As a comparison, the 980 pro gets just 600 whole disk writes.
If you are based in the EU, I have a new pm1735 1,6 Tb for sale. Contact me if you are interested.

4 Likes

Hi,
first of all, thanks for this great thread, it very useful!
I have two RTX 3090 (PALiT GeForce RTX3090 GamingProOC) for computer vision tasks. It is quite frustrating to choose the remaining components, because I am not hardware geek. Could you help me please?

Motherboard

  • I found these Motherboard as suitable for 2x rtx 3090 from the previous posts:
    – ROMED8-2T,
    – X11SPA
    – asus x299-Sage
    – ASROCK X399 Taichi
    – Msi MEG X299
    – ASUS ROG Zenith II Extreme Alpha TRX40
    – Gigabyte TRX40 Designare
  • Which motherboard would you recommend me? Are they really all suitable?
  • Is it good idea to use PCI-E extenders? (For example motherboard having few number of PCIe slots like ASUS ROG STRIX X570-F GAMING)

CPU
In blog by Tim Dettmers he writes: “I recommend a minimum of 4 threads per GPU — that is usually two cores per GPU”. For example, does it mean that AMD RYZEN 5 2600, 6 cores, 12 threads, 3,4GHz can be enogh for borh rtx 3090? Or I should prefer something more powerful such as AMD RYZEN 9 3950X?

PSU
RTX 3090 should have 350W. However I could read here that in some cases it can jump to 450W. So I need:

  • 2x rtx 3090: 2*450W
  • CPU: ~200W
  • rest: ~100W
    Total is ~1200W. Is 1250W PSU enough?

Thank you!

Thank you for your detailed answer. We are located in Israel.
We do need to move datasets, so special thanks for SSD recommendation, we will take 3,2 or 4 TB to stay with one disk to rule them all.

The PM1735 does exist in 3,2 Tb format (or more). You should be able to find it from business dealers (it’s not a retail part).

It’s not easy to provide just one straightforward recommendation. For example, none of the motherboards you listed would accommodate the processors you mentioned.

I think you have to do a bit more research about these topics, since different solutions will provide distinct sets of advantages and disadvantages.

Your reckoning about PSU power is correct though, and about extenders, I don’t actually use them as of yet. They could bottleneck pcie bandwidth.

1 Like

@balnazzar yes true, none of them are compatible with the AMD RYZEN 5 2600 :smiley: Sorry.
However what about these motherboards:

All of them have socket AMD AM4 so I can have AMD RYZEN 5 2600 or better one AMD Ryzen 7 3700X. Also they have 3 PCI-e 4.0 slots 16x, so if I put first GPU to the first slot and the second GPU to the third slot (second slot would be empty), there could be enough space for both RTX 3090 with a gap between them. Is it right?

I prefer B550 vs X570 since the former doesn’t have that whiny chipset fan that will fail an the wrong moment. Gigabyte B550 Vision D. Correct slot spacing.

1 Like