Recommendations on new 2 x RTX 3090 setup

iamgianluca · September 5, 2020, 3:00am

Hi,

I’m selling my old GTX 1080 and upgrading my deep learning server with a new RTX 3090. I’m also contemplating adding one more RTX 3090 later next year.

I’ve read from multiple sources blower-style cooling is recommended when having two or more GPUs. There are not many blower-style options confirmed yet. As a matter of fact, the only one I could find is the Founders Edition. Most (all?) AIB cards seem to opt for either double/triple fan or water-cooling.

My only concern about the Founders Edition is if the new pass-through fan design allows the cards to sit right next to each other ― without a slot gap between cards ― without thermal throttling?

Alternatively, should I consider water-cooled options? I would do without water-cooling, if not really necessary… mostly because it will increase the price of the card.

What are your thoughts?

DanielLam · September 5, 2020, 7:13am

I have a 2 slot blower 2080TI, and a 3 slot 2080TI regular fan. Thermal throttling still happens a little. I should have got 2 blowers. One thing about the blower is that it produces a little more noise than a 3 slotter.

If you want water-cooling, you have to wait until someone has that option, or you do it yourself - which I don’t recommend unless you’ve done it before. If you’re worried about the thermal cooling, I’d wait until someone does a multi-GPU setup review.
Double check your computer and motherboard space. The 3090 founders edition (only one available at the moment) takes 3 slots. I bought the the wrong motherboard, and then realized I can’t use the last PCI slot because some motherboard electronics pop up, and would block another GPU.
I don’t recommend selling the 1080 until you get the second GPU (unless you really need the cash). I would keep try to keep at least 2 GPUs on hand. Usually, you’ll have a long training set and that will tie up one GPU (guessing 3090). Then you can use the other GPU (1080) to run smaller simulations. If you’re not running long training sets, then 1 GPU is enough.

balnazzar · September 5, 2020, 11:01am

If you want to use two 3090, you basically have the following options:

Get a motherboard with four full-length pcie 16x (2-slot spacing). In such a way, you can install 2 3090FE with one slot separation between them, without interfering with the front-panel connections. Of course the upper card will suckle very hot air from the lower one. Don’t know if you’ll incur in thermal issues.
Get any motherboard you like, and a case with 3-slot vertical mounting (e.g. the phanteks P600S), and a PCIe extension cable. This seems to be the most viable solution for 2 3090FE, as long as you have at least one centimetre of separation between the vertically mounted card and the side panel of the case. Note that you will lose the ability to use the NVlink bridge. Futhermore, since I expect the card to weigh a ton, it is not guaranteed that the bracket will handle the weight correctly. In such case, you will need a vga support bracket like this: https://www.amazon.com/upHere-Graphics-Anodized-Aerospace-Aluminum/dp/B079HSVSLR/ref=pd_lpo_147_t_0/134-1431875-3000427?_encoding=UTF8&pd_rd_i=B079HSVSLR&pd_rd_r=85c189a0-7cba-484b-aff0-126d28b80a05&pd_rd_w=tXiZI&pd_rd_wg=9Tjlb&pf_rd_p=7b36d496-f366-4631-94d3-61b87b52511b&pf_rd_r=HMZZ97NJ3HW5S1MNMTM0&psc=1&refRID=HMZZ97NJ3HW5S1MNMTM0

EDIT: None of the cases I have investigated do have the correct clearances for mounting at least one 3090 vertically.

I really doubt there will be a 2-slot blower design for a 350W gpu, and a 3-slot blower seems to be unrealistic. The only remaining option are the liquid or hybrid models from EVGA.
You can expect them to cost some 2000$. Also, Evga has not yet provided any ETA. https://www.evga.com/articles/01434/evga-geforce-rtx-30-series/
I personally don’t like such contraptions. One adds other points of potential failure, the costs increase, and such cards are difficult to resell, mainly because people tend (understandably) to stay away from used liquid-cooled stuff.

My advice? Get two founder’s. Try to use it the canonical way. If the upper one overheats, buy a case with 3-slot vertical mounting option.

WaldemarWalo · September 5, 2020, 3:20pm

Hi,
Disclaimer: My experience is only from gaming and crypto mining perspective on a 980ti and 1080ti overclocked and running stable 24/7 for many weeks (I have cheap electricity that allowed my 980ti earned for my 1080ti few years back)

TL/DR:

don’t get the reference card, get a 3rd party card
water cool everything
overclock

Longer answer:
I am assuming you want to squeeze the most out of your setup, you want it to be super stable and work 24/7 under full load without the sounds of a helicopter taking off.
I would recommend the below setup that I am planning to get for myself around Nov-Dec (for 4k gaming and deep learning):

First of all get a custom/3rd party card by Asus, MSI or EVGA. Simply wait for the test on Gamers Nexus or guru3d.com and pick the one that will give you best cost/performance.

3rd party card will by a be faster (1-3%) then reference card, as they are factory overclocked, and will give you more stable overclocking potential. This can give you 10 - 15 % more performance over reference card. Have a look here:
https://www.guru3d.com/articles-pages/msi-geforce-rtx-2080-ti-gaming-x-trio-review,28.html
in the past 3rd party cards always came with superior air cooling. When the GPU is idle the fans don’t spin at all, and under full load they are always more quiet and cool the GPU better. In case of 3090 the new reference cooling may be good, but I doubt that it will be better than what the 3rd parties will come up with (please have a look at other sections of the article I’ve linked above to get the numbers).

Water Cool everything - With 2x 3090 setup I would recommend to go for full water cooling for all GPUs and CPU, with use of 2 very big radiators. e.g. half the number of radiators this guy used : https://www.youtube.com/watch?v=Q2SIrV_4-dM&ab_channel=JayzTwoCents

by water cool I mean do the entire setup yourself, don’t buy a hybrid card, the cost of a hybrid card is comparable to 3rd party card + water block
check the vide and check his temperature under load and the low noise levels
you will have to wait for an extra 1-2 months for custom water blocks, but then your cards will go down to being 1 slot GPUs so mounting them wont be an issue
even if Nvidia’s air cooling will be decent, I can guarantee you that after a year or 2 under full load the bearings will die and will require replacement. Buying new reference cooling wont be an option and a replacement 3rd party air cooling wont fit nicely and will cost you over 50% of the cost of water block

Overclocking - if we assume that 3090 is 2x faster than 2080ti (it will be less), and you will squeeze an extra 12.5% from each of your 3090, then you will get an additional 50% of performance that a 2080ti will give you.

overclocking is super easy, you have 3 sliders for GPU, MEM and voltage, you can use values a bit lower that guru3d (or another service will recommend), you don’t have to spend time to fine tune the overclocking yourself
your cards will not throttle down and work with 110+ % performance
you can get a block for your older card and include it in the block
they will work quietly and stable, day and night

Let me know what you think.
Regards,
Waldemar

balnazzar · September 6, 2020, 11:16am

I think you are making some assumptions that probably don’t reflect the OP’s use case.

You assume that the typical deep learning practitioner is a hardware geek able to custom-watercool an entire machine with no hassles, without wasting a lot of time, and without frying everything (contact between waterblocks, chips, memory, etc… Filling the circuit without air bubbles, choosing the right components, having a lot of time to investigate, learn, and do the actual work).
Note that if you do a mess and damage 3000$ worth of video cards, that would be unpleasant.
You assume that the workflow of a deep learning practitioner is similar to the workflow of a miner.
You don’t run full load 24/7 for years in our field.
You assume that he’ll be using windows, and can use afterburner with its sliders, while almost any DL practitioner do use linux.

WaldemarWalo · September 6, 2020, 11:50am

Yes, I totally agreed, thats why I’ve started the reply with a disclaimer.
I think you have to show some hardware love to complete what I’ve recommended
At least wait for the 3rd party card tests, so you will be able to compare it to stock performance and cooling.

I will try to log the time it took me to do the research, buy parts, assemble, tweak the setup and update this thread by end of the year.

iamgianluca · September 8, 2020, 12:33pm

Thank you both for the suggestions!

@DanielLam good idea on keeping the 1080 until I have the second 3090. One of the things I hated the most over the last year was actually not being able to run multiple experiments at the same time.

FYI, Tim Dettmers just updated his guide to include recommendations on the new Ampere GPUs.

It would be interesting to see some stats about the different cooling solutions. I wouldn’t mind buying two Founders Edition and use PCIe extenders if that solution doesn’t incur in thermal throttling when running large models for 12+ hours. I would go with liquid cooling only if really necessary, due to the considerably higher prices.

init_27 · September 12, 2020, 4:36am

@iamgianluca
I’ve been looking at building a similar build as well.

I’ve decided to wait for reviews and not grab the 3090FE during the firesafe for the following reason:

Historically, FEs have a bad reputation for cooling. It seems, they’ve improved but if you’ve looked at the design, they have a 3/4 PCB and AIO boards have bigger PCBs and even bigger coolers!
FEs might be ~300$ (total) cheaper than 2x3rd party cards, but at this point, I’d rather have a PC that doesn’t overheat or overheat other components.
As of now, it’s shown that the CPU/RAM would mostly be okay…but for a single GPU, not for dual!
Also, if you’re building a new system, the AMD cpu event is in the first week of oct, might be worth waiting for it.

Hope this helps!

rwightman · September 12, 2020, 5:06am

A few comments

A lot of people think they’ve got a stable multi-gpu system but don’t actually realize it’s thermally throttled for any significant workload. If one of your GPU temp is 89 (default settings) in nvidia-smi, you’re likely throttled with reduced performance. In a multi-GPU deep learning the slowest GPU sets the speed for all unless you’ve got an exotic asynch setup (rare).
I found it really challenging to air cool 2x Titan RTX with no throttling under load in a roomy case with upgraded case fans and a slot spacing between the cards. 2x 3090 is going to be interesting
I feel deep learning load can easily be as high/higher than mining. I run jobs for weeks with 95-98% utilization. Another interesting point with DL loads, they are heavily and tightly synchronized. If you really max out the batch sizes and elimate IO / preprocessing bottlenecks there are some significant current surges with each batch which leads to the next…
Have a really good power supply, consider a line conditioner like the Tripp Lite ones (https://www.amazon.ca/Lc-2400-Line-2400-Led-120v/dp/B0000514OG) if you have poor building voltage, long runs from your breaker, other things on the circuit. The voltage drop for on of my machines is significant at the high current, it trips UPS or line conditioner devices as it pulls the line voltage quite low on some activity spikes. A PSU with fat caps or an external LC/UPS helps.
More on PSU, don’t buy Seasonic. I used to love them but my DL workloads cause them to bonk out after half a day with a sudden shutdown. Some sort of thermal/current protection that just shuts them down, even when well overprovisioned. This has been observed with 3 Seasonic Prime Platinum and Prime Ultra Titanium. I’m running all EVGA supplies now and an EVGA supply a step below the power rating of an ‘equivalent’ Seasonic has always been stable. I think the high end corsairs should be okay too.
With multiple GPUs at 100%, your multi-core CPU at 100%, everything will heat up. The case, the MB, the RAM, the drives, the room. Get lots of fans, ensure lots of airflow. I’ve had some CPU stability issues here or there that required some fan RPM tweaking and case mods, but the biggest failures cause by non stop CPU + GPU heat has been NVME drives sitting under the GPU. They don’t seem built for that level of sustained temp. Keep the airflow up!

init_27 · September 12, 2020, 6:39am

Meanwhile, I decided to ask the content creators in this space to help us out https://twitter.com/bhutanisanyam1/status/1304670237741125632?s=20

balnazzar · September 12, 2020, 9:18pm

True, but these FEs do have a push/pull design that should be very effective. I’d advise to wait for a review, though. Liquid is much better, but only if you know what you are doing…
It depends. The Turing FEs were better at cooling than some crappy 3rd party designs.
What kind of CPU are we talking about? In general, 2 threads per GPU should be sufficient, no matter how powerful the GPU is. Another matter is collateral tasks (data augmentation, processing of big tabular data, etc…). Here, the more powerful the CPU is, the better. Mind that AMD cpus still struggle with MKL-intensive tasks.
What are they expected to present?

Thanks.

balnazzar · September 12, 2020, 9:30pm

rwightman:

I run jobs for weeks with 95-98% utilization. Another interesting point with DL loads, they are heavily and tightly synchronized.

More on PSU, don’t buy Seasonic. I used to love them but my DL workloads cause them to bonk out after half a day with a sudden shutdown. Some sort of thermal/current protection that just shuts them down, even when well overprovisioned. This has been observed with 3 Seasonic Prime Platinum and Prime Ultra Titanium. I’m running all EVGA supplies now and an EVGA supply a step below the power rating of an ‘equivalent’ Seasonic has always been stable. I think the high end corsairs should be okay too.

I’ve had some CPU stability issues here or there that required some fan RPM tweaking and case mods,

but the biggest failures cause by non stop CPU + GPU heat has been NVME drives sitting under the GPU. They don’t seem built for that level of sustained temp. Keep the airflow up!

May I ask:

What kind of 3/4 wk continuous jobs do you run?
Corsair and EVGA don’t manufacture PSUs. They rebrand proper manufacturers. For example, I use an EVGA T1000, which is buit by Seasonic.
The vast majority of high-end evga/corsair/coolermaster PSUs are built by Seasonic, Fortron, SuperFlower, etc.
Do you use your PSU at its maximum nominal power output?
What kind of CPU and Heatsink? It’s very rare to experience CPU stability issues.
Anyway, if you can afford them, use Xeons rather than Core-i.
They are cherry-picked amongst the best silicon, usually from the very centre of the wafer. No matter having a TDP identical to their desktop counterparts, they draw less power and produce less heat.
Don’t install nvmes under your titans. Due to their heatsink design, the (very) hot air from their heatsink, probably near 89C, will be shot directly upon the ssds, and they are not built to withstand that kind of thermal stress. Buy a pcie on-slot controller, even better if with a pcie cable extender.

rwightman · September 12, 2020, 10:10pm

balnazzar:

May I ask:

What kind of 3/4 wk continuous jobs do you run?

Corsair and EVGA don’t manufacture PSUs. They rebrand proper manufacturers. For example, I use an EVGA T1000, which is buit by Seasonic.
The vast majority of high-end evga/corsair/coolermaster PSUs are built by Seasonic, Fortron, SuperFlower, etc.
Do you use your PSU at its maximum nominal power output?

What kind of CPU and Heatsink? It’s very rare to experience CPU stability issues.
Anyway, if you can afford them, use Xeons rather than Core-i.
They are cherry-picked amongst the best silicon, usually from the very centre of the wafer. No matter having a TDP identical to their desktop counterparts, they draw less power and produce less heat.

Don’t install nvmes under your titans. Due to their heatsink design, the (very) hot air from their heatsink, probably near 89C, will be shot directly upon the ssds, and they are not built to withstand that kind of thermal stress. Buy a pcie on-slot controller, even better if with a pcie cable extender.

Lots of experiments on large(ish) models on large(ish) datasets. Many of the training sessions I do on public datasets like ImageNet or COCO I make available as pretrained weights as they’re better than many of the typical sources. Working on some OpenImages experiments right now which is a real slog.
I’m pretty sure all EVGA G/P/T2 or G3 are Super Flower Leadex and they’re great. I don’t think any of the higher end EVGA have been Seasonic OEM. But yeah, apparently some of the models coming out now (ie G5) are FSP and they are not good. So, maybe no blanket statement on EVGA just the ones above. All recent Seasonics that I’ve tried have that issue. They are not at the limit power wise, I think it’s a surge current or thermal safety that’s tripping, way too sensitive. The Seasonics also trip the line conditioner or UPS like you wouldn’t believe, every transition between eval/train they’re drawing more current from the mains than the EVGA. Much less of a problem with the EVGA. I use 1600 T2 and 1300 G2 in main machines. But when I was debugging the Seasonic issue with a Prime Ultra Titanium 1000W and Prime Plantinum 1300W (slightly better), I borrowed a G3 1000W before buying the G2 and it handled the load just fine… I even swapped in a 850 T2 for a bit (underspec’d) and it managed the load without shutting down.
Pretty much always go with the Noctua D15’s one fan swapped to clear the RAM. No XEON as all the system components go up in $$, but have been on the X-series x299 for a few gen now. Likely switch to TR for next build.
Lessons learned. I need the NVME for certain workloads though, rare to find a MB that doesn’t have all NVME slots under/next to GPU. I think next case might be an open mining rig in the garage.

balnazzar · September 12, 2020, 10:19pm

rwightman:

I’m pretty sure all EVGA G/P/T2 or G3 are Super Flower Leadex and they’re great. I don’t think any of the higher end EVGA have been Seasonic OEM. But yeah, apparently some of the models coming out now (ie G5) are FSP and they are not good. So, maybe no blanket statement on EVGA just the ones above. All recent Seasonics that I’ve tried have that issue. They are not at the limit power wise, I think it’s a surge current or thermal safety that’s tripping, way too sensitive. The Seasonics also trip the line conditioner or UPS like you wouldn’t believe, every transition between eval/train they’re drawing more current from the mains than the EVGA. Much less of a problem with the EVGA. I use 1600 T2 and 1300 G2 in main machines. But when I was debugging the Seasonic issue with a Prime Ultra Titanium 1000W and Prime Plantinum 1300W (slightly better), I borrowed a G3 1000W before buying the G2 and it handled the load just fine… I even swapped in a 850 T2 for a bit (underspec’d) and it managed the load without shutting down.

Pretty much always go with the Noctua D15’s one fan swapped to clear the RAM. No XEON as all the system components go up in $$, but have been on the X-series x299 for a few gen now. Likely switch to TR for next build.

Lessons learned. I need the NVME for certain workloads though, rare to find a MB that doesn’t have all NVME slots under/next to GPU. I think next case might be an open mining rig in the garage.

Thanks, that’s valuable info. Maybe it would be better to get the SuperFlower directly, though. For example, their 2000W is capable of powering four 3090 and costs “just” 370 eur.
Mind that the TRs produce an unbelievable amount of heat, plus they cannot use the cheaper registered dimms. I’d advise for the Epyc Rome. The 7282 is cheap, it’s 16 cores, it has 128 lanes, and runs very cool.
Beware of the dust.

init_27 · September 13, 2020, 12:50am

Thanks Andrea!

In India, the availability is a challenge so I think I’ll wait nevertheless.
Agreed, But I’m hoping to wait for all of the reviews of 3rd party ones and get the ones that work best.
Sorry for not being clear, I was referring to the fact that FE design will blow hot air onto the CPU, we have no clue how that’ll affect a 2xGPU design, it should be “okay” for 1xGPU is the speculation but not sure for 2 of these FE designs.
Sorry, I’m very new to Team red, all I know is, they’re releasing their new architecture based CPUs, which will still be 7nm. Some people have mentioned they would refresh the APU Lineup.

Here’s the Build that I think I’ll end up getting:

The answers here would be dominated by the Motherboard I believe, I found the ASUS Zenith to be the best motherboard available in India, which would force me to start with a 3960x, I know it’s an overkill but other builds would be just ~150-200$ cheaper in India and “might work” so I rather have one that would have 0 cooling issues.

Andrea, I would also mention that 2nd hand parts are impossible to grab in India so I don’t think that would be possible.

This is what I’ve learned so far, I’ll keep looking.

Thanks

init_27 · September 13, 2020, 9:54am

Some more hardware rumours:
NVIDIA might release a Quadro 8100: 48GB of RAM (In the first week of October)

These are known to be of blower style and v thin. If the price difference is ~1.2x of 2x3090, these might be interesting options to consider.

Then you save $$$ on mobo and casing and spend the $ on Quadro

However, you’ll be having just 1 (fast and spacious) card. It’s an interesting tradeoff

Link

balnazzar · September 13, 2020, 11:31am

Mh, I don’t think so. Quadro RTX 6000 was priced almost twice the Titan, no matter being the same card.

init_27 · September 13, 2020, 12:07pm

Actually, Just checked the previous pricing:

Quadra 8000 was 5500$ Vs 5000$ for 2x Titans (2500$ Each).

If something like this happens, it might be worth it, rather than all of us being forced to purchase the 4x PCIE slott motherboards that cost 800$ + Then being forced to buy a higher end CPU to match the socket, followed by bigger case requirements…

WaldemarWalo · September 13, 2020, 3:41pm

Thanks for sharing the link to Tim’s blog. Looks like that guy totally knows what he’s writing about !

init_27 · September 13, 2020, 4:25pm

Apologies, writing this as a separate reply to notify the people interested, I’ve just learned the Quadro 8000 was launched at 10k USD and later slashed to 5.5 USD.

So my expectations are now much lower and I will head back to PCPartpicker and start sifting through motherboards again