Recommendations on new 2 x RTX 3090 setup

davidzweig · June 21, 2021, 11:18pm

I would have liked to post a new topic for this question, but seems I don’t have sufficient forum permissions yet.

Hello. If I was in the lucky situation to have eight rtx3090 turbo (blower) cards, what would be a good low-cost chassis/motherboard/cpu for them? We’d use it for training translation and ASR models (headless). Tower or rack, it’s not really important, but I’m not aware of any such tower machines.

There are the “classic” Xeon E5 v3/4 machines, such as the supermicro 4028GR-TR2 (single pcie root), and other similar machines that can be bought for less than 5000 Euros used on eBay US. These connect all GPUs to one cpu, with four GPUs connected to one CPU pcie 3.0 16x via a PLX chip.

Then there are EPYC based machines, that have lots of pcie 4.0 lanes (128?), but are maybe not true ‘single root’. I don’t see any of these for sale used, I’m trying to figure out what they cost new.

Can anyone share some information?

balnazzar · June 22, 2021, 11:31am

If you want to pack eight gpus into a single machine, the only affordable solutions are indeed the Supermicro gpu servers with the daughterboard. But I think you will be better served (and have less complications) with two distinct 4-gpu regular machines.

Epyc, starting from Rome, is ‘true single root’ (Naples is not).

davidzweig · June 22, 2021, 2:54pm

@balnazzar
Here’s a machine for $2700, with two E5-2680 v3 cpus, 256Gb ram. Asus Z10PE-D16 WS motherboard. 4x2000W power supplies. We’d need to import it, but looks like a decent deal.

We’d have storage (hdds + ssds) on a seperate machine, and connect it with connectx-3 cards.

There’s also Zotac Trinitry cards for 1650 EUR + VAT now (check gputracker.eu). These could be water-cooled. I don’t have previous experience with such a system, perhaps you could consider assisting us with ordering all the required parts, if you think it’s a decent idea.

An eight-gpu system is better for us, the software we would run (Nvidia NEMO, Marian translate, wav2letter) scales well to 8 gpus, but good gpu-to-gpu communication is important (see Training speed of Transformer-Big · Issue #231 · marian-nmt/marian · GitHub for example).

Thanks for all your comments in this thread btw, I have read them all carefully with interest.

EDIT: Found some pricing for new Supermicro machines: A+ Server 4124GS-TNR Australia... About $9000 USD (+ VAT?) for minimum spec (2x 7252 with 128Gb ram). Ouch.

EDIT2: Very similar pricing for the Gigabyte machines, ~$8500 USD: GIGABYTE : GPU Servers

I did price out making a 4-GPU EPYC or Intel system with a workstation board, it was around 1400-1800 EUR if I remember correctly (chasis, motherboard, cpu, ram, power supply - no gpus or storage).

balnazzar · June 22, 2021, 5:41pm

@davidzweig
All I can say is that two 2680v3 are sufficient (I had them). All the rest is a matter of taste, but at 2700$ the asus seems to be very good. More specifically, at that price point it’s to be preferred to two separate systems (IMHO!).

balnazzar · June 22, 2021, 5:50pm

Thanks, @dvachalek. With my turbos I got ~95-98C on both with the cards at 260W and no memory underclock. They are, of course, very loud. Sooner or later I’ll find time and patience to liquid-cool them.

I think these are pretty acceptable results (yours, in particular). Have you tested whether the rather severe memory underclock leads to major perf losses (with the same overall wattage) in DL-related tasks? Thanks.

Mukei · June 30, 2021, 5:47am

Hi Everyone,

After going through that long thread I am quite excited to join the discussion!!

I am looking to build my own rig and was looking for option (as many here).
I also checked Lambda Labs, Bizon Tech, Sage etc, but again the price is a bit high, though I can admit it seems convenient to have something working out of the box.
In the past I built a few computers, but no such workstation…

I was looking to get 2x 3090, build around that setup, and add other GPU when their prices go down (if ever…).
I was suggested to build a setup with 2x A5000 (NVlink) to begin with, to be able to easily add other A5000 when the time comes. Also, it help avoiding issue the 3090 heat/power supply issue.

I wanted to get your input on that idea.

If I build such a setup, to avoid any hardware issue I guess I would base my setup on lambda’s config. Also, I intend go to some local store and why not finalize the config with them.
I’m based in Tokyo and there are a few vendors selling some BTO with 2~4x 3090 setup etc.
So, I will also looking at those options.

Thanks

zunaed · July 6, 2021, 9:54am

Hi @balnazzar ,
Can you please share some insights with my following choice.

I am currently about to build a rig for my ML research group at my university. We research biomedical projects so our task ranges from computer vision ( CXR, CT, etc ) to signal analysis. We want to order from this site: Customize your deep learning GPU workstation | Lambda

We wanted to build 4x3090 but only the 2x3090 option is available. If I am not mistaken the RTX 3090 is better than RTX a4000, a5000, and on par with RTX a6000. Now my question is will 4xa5000 be better than 2xRTX3090? 4xa5000 will have more memory amount but will it have more compute power than 2xRTX3090? As RTX a5000 has much less Cuda core and tensor core counts than RTX 3090. Should we go for 4xRTX a5000 or should we get 2xRTX 3090 instead?

balnazzar · July 9, 2021, 8:11am

Hi @zunaed. Unfortunately, I don’t know the details about the cooling system adopted by Lambda, but rest assured that four a5000s will be easier to cool than two 3090s. The difficulties with the 3090 lie in the fact that they backside VRAM tend to overheat, while A* does not have this problem thanks to the adoption of the slower, but cooler, GDDR6 (non-X).

Furthermore, consider that four a5000 will give you twice the amount of VRAM, and that’s quite something. There is a golden rule in DL hardware: priority is to be given first to VRAM amount, then to speed. For if a model with correct batch size doesn’t fit into the VRAM, you won’t be able to train it. There is “addressable BAR” (that is, using the system memory to extend GPU’s vram), but AFAIK it cannot be used by CUDA applications, and anyhow the performance would decrease so much that’s not really worth it for DL.

So, go for four A5000s. I don’t think you will regret it.

balnazzar · July 9, 2021, 8:15am

Hi @Mukei , welcome in this community.

All I can tell you is that is that the 3090 is really a great card for DL. But I’m having a hard time trying to keep the backside VRAM under 100C, even at decreased power levels.
In the hindsight, I’d go for the a5000/6000 series.

Hope this helps.

florianl · July 9, 2021, 9:29am

Did you try to reduce the memory clocks? Im running linux so unfortunately I can’t monitor the memory temps … but right now I’m reducing the power level to <=300 watts if I have long running tasks and hope it helps to keep the temps “low”.

balnazzar · July 9, 2021, 9:59am

I run linux too, so just control over the overall power levels. I test the vram’s temps by booting into windows, launch a miner, and then monitor with hwinfo64.
Given the season, I have to limit the power level to 240-250W in order to stay around 95C while the miner is running.

AFAIK there is no way to lower the mem clock in linux, and starting since 20.04 I have been unsuccessful even in setting the coolbits. Any such attempt to date has broken Xorg.

florianl · July 9, 2021, 12:20pm

I think it should be possible - I do the following to set the fans to 100% so I guess it should be possible to change the memory clocks too?! I read a lot and struggled to get Xorg etc … working but these three lines made it work! but I could be missing something to be able to change the clocks - I’m not an expert here

sudo X :0 &
export DISPLAY=:0
nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=100 -a [fan:1]/GPUTargetFanSpeed=100

Vival · July 9, 2021, 10:58pm

Does anyone have a solution to cool 2x RTX 3080 Ti build. I don’t want to go water cooling way unless it is really necessary because this is going to be my first build. My cards are EVGA FTW3 Ultra and XC3. I heard that top gpu is choked by the bottom one, if i use ASUS x570-e would there be enough spacing to cool the cards? Can you share your experiences with me, thanks so much!

balnazzar · July 11, 2021, 11:03am

Thanks, I’ll try it ASAP

vijay_m · August 11, 2021, 3:50pm

Hello everyone, Thanks for this thread iam able to put together a DL setup. Let me know if the following will work together -

CPU- AMD Ryzen 7 5800x
GPU - 2x - PNY Quadro RTX A5000 24GB Graphics Card
MOB - Gigabyte B550 Vision D
RAM - 2x 32gb DDR4 3200 + 2x8gb DDR4 2400 (from old PC)
SSD- M.2 NVME gen 4
case: Lian Li PC-O11 Dynamic Razer Edition Mid-Tower ATX
Power supply - corsair RM850x (from old PC)

safamoosavi · August 31, 2021, 2:22pm

Hi there,
I purchased two Geforce RTX ROG 3090 OC
I would like to make them sli, I was wondering how can I do it? Do I need a buy NVlink bridge? If I do which brand and model I have to use.
My cpu is Threadripper 3970X
Motherboard: ASUS zenith II EXTREME alpha.

Interogativ · September 4, 2021, 8:13pm

I have a machine with two 1080s and used the bridge for that machine and it didn’t really improve the performance, My RTX-3090 box (only 1 card) has 24GB of RAM which works for all of my models (16 bit on the big ones). The 3090 is an awesome graphics card for DL. I highly recommend it with any CPU.

davidzweig · September 15, 2021, 9:54pm

Anybody know is P2P memory transfers are supported by the A5000? Prices aren’t bad, about 2500 EUR (PNY RTX A5000 ab € 2484,96 (2021) | Preisvergleich Geizhals Deutschland). We have 8x 3090 Gigabyte turbo to put in our Xeon v4 chasis, but they are stuck in transit for a few more weeks, the A5000 could be a nice alternative. If supported by the card, I think p2p would work for us as we have single-root. Puget systems article suggests a 5-20% speed boost with p2p at 8 gpus, depending on application, which would mostly compensate for the speed difference between 3090 and a5000 (about 20%). Anyone want some 3090 turbos for a good price? (in Berlin) PM me.

davidzweig · September 15, 2021, 11:12pm

I’m setting up a second box too, will look like this:

2x 3090 Turbo with NVLINK
5950x 16C (about 650 EUR + VAT)
Asrock x470d4u board with IMPI (without 10G networking onboard). (200 EUR + VAT) (x570d4u is about 300 EUR + VAT, you get 8x + 8x PCIE 4.0 and two more 4x PCIE as m.2 slots… can use an adapter/riser cable to fit a fast network card.)
64Gb ECC ram 3200mhz (280 EUR from Ebay… UDIMM, no cheap RDIMM with Ryzen)
ConnectX-3 40Gb network card (50 EUR, Ebay), connecting to storage server, needs a PCIE riser cable.
Noctua NH-U12S cooler, fits on this board, not all big coolers do. (60 EUR)
Fractal Design Define C case with two extra 140mm fans. Will cut away some of the metal bracket seperating PCIE slots, and tape up the inside a bit to create positive pressure, to push air between the GPUs. (60 EUR)
1300W PSU. Thermaltake have a 1200W one for 193 EUR + VAT that has a good review on tomshardware, but no stock: https://www.amazon.de/dp/B0773B2VXZ?linkCode=xm2&camp=2025&creative=165953&smid=A3JWKAKR8XB7XF&creativeASIN=B0773B2VXZ&tag=geizhalspre03-21&ascsubtag=fdBFYpJiJ2UcQ8r4b46AsA

For similar processing power, 64GB, motherboard with IMPI, threadripper would have cost about 770 EUR more, Epyc about 970 more, as I worked it out. You get more PCIE slots, further RAM is cheaper, and you can upgrade to faster more expensive CPUs (5950x is max on AM4). Dual socket EPYC boards also aren’t that much more that single socket. But for two GPU, this should do quite well for us.

Buying new hardware is pretty expensive! If you don’t need a fast CPU, and need lots of cheap DDR3 ram (and don’t mind noise), Fujitsu TX300 S7/S8 makes a nice home for 2-3, even 4 2-slot GPUs (if more than two, they will be split over 2 cpus with QPI link). Xeon E5-v2, can buy for 250-350 EUR on Ebay DE. You’ll need to find GPU power cables, they are available from suppliers, add about 30-40 EUR. HP ML350 might work well too, need to go hunting for power cables too. Also Xeon workstations (HP Z820, Lenovo etc.), quieter, but probably no IMPI.

balnazzar · February 15, 2022, 4:29pm

Hi guys. Just to say that due to electricity prices skyrocketing in EU, I had to rethink my setup and go for a smaller footprint, fewer-cpu-cores, single-GPU setup.
I already sold my Xeon 8260M machine, but if you are interested, I’m putting my two 3090s (Gigabyte Turbos) for sale on ebay.

For anyone interested, I’m now on a less pretentious Alder Lake i7-12700 platform, with 128gb non-ECC ram, and a single A6000. The good thing is that it idles at less than 40W.