Recommendations on new 2 x RTX 3090 setup

Maybe a stupid question, but did you come across a way to monitor GPU mem temperature in Linux? Both nvtop and nvidia-smi are only reporting the core temperature.

By the way, if any of you is interested, after water cooling, my 3090 Turbo temperatures fell from 72 degrees to 43 degrees with a power limit set to 280 watts and 47 degrees without power limiting. It is certainly not cheap, but IMHO worth the cost just for the noise reduction.

1 Like

unfortunately you can’t monitor the GPU memory temps in linux. NVIDIA doesn’t provide the APIs for that.

You could possibly run wsl2 on windows and use GPU-Z to monitor the memory temps

2 Likes

I wanted to ask the same question.

Interesting. Would you install windows, run a miner, and report vram’s temps? Thanks.

Also, how is your loop structured?

True, that could be a significant factor.

My loop is very simple, I only have reservoir -> pump -> water block -> radiator. I have a 360x45mm rad. I’m going to add a second radiator as soon as I add the second 3090 to the loop.

To be completely honest, I don’t think I would have the time to install Windows on another partition just to check the memory temperature anytime soon — it’s a crazy time at work + one big personal project taking lots of my spare time.

I need to figure out how to reduce the memory temperature though as the backplate is quite warm. I might just try to measure the backplate temperature to approximate the gpu memory temperature sometimes next week. I saw people adding heatsink + fans and some even going all the way to installing a memory water block on the backplate. Maybe just a side fan blowing air toward the gpu on the back direction is enough.

The main thing for me is to find a solution I can use for both cards as I’m planning to add a second 3090 this year.

1 Like

Hi @dvachalek,

Regarding the cost of the custom loop, I think you’re overestimating it — at least, if you’re based in the US. My custom loop to cool one 3090 costed me approximately $500. Adding one more card would cost $240 or less (radiator + water block).

For the backplate temperature, some people reported reducing the temperature from 110 to 75 degrees while gaming thanks to heatsinks and a fan. Miners, using a similar solution, report temperatures below 100 degrees while also overclocking the memory.

I’ll try the heatsinks plus fan solution, but even just using a fan blowing air from the middle of the case towards the back of the case makes a huge difference to the touch. I think the water cooled backplate, although nice, is not necessary.

That’s because they spend a lot in commercials & ads, and of course the end user is gonna pay for them.

Just get good rads and a genuine D5 pump. The alphacool blocks for the Turbo are fine as well and don’t cost much (~130eur apiece). If you still experience >100C for the vrams, just use a RAM block onto the backplate (~40eur), or a good number of passive heatsinks.

Amongst the rads, the XT45 used by Gianluca is a good option, but as all the alphacools it requires an energic rinsing before use.

2 Likes

These are interesting cards indeed. But as of yet, that’s the lowest price I was able to find in EU:

(I have no intercourse with this vendor and you can be able to find better prices by searching a bit harder).

Now, that price is indeed the price for a 3090, these days. Maybe it’s even a bit cheaper.

Advantages over a 3090:

  • runs cooler and without that damn vram overheating problem.
  • less power demanding.
  • full-fledged NVlink, 112 GB/s (but see note)

Disadvantages:

  • less raw performance
  • less resellability

Note: Only 2-slot and 3-slot nvlinks, whereas the 3090s come with 4-slot option. Check your mb layout.


Another interesting card: the A4000. Single slot, 140W, 16Gb. Delta has it for 1180eur. With 3 of them, you can have 48gb at 430W, for 3500e.
Sure, no NVlink, but upon an Epyc/TR machine with pcie 4.0 that would be a non-issue.

1 Like

Sorry, I’m just catching up:

On the topic of A6000 Vs 3090, please let me know if you’d want me to run any benchmarks for you. :slight_smile:

2 Likes

Of course we’d want them :stuck_out_tongue_winking_eye:

What about the usual stuff? A bit of convnets and a big transformer?

Thanks :slight_smile:

2 Likes

Sounds good, Thank you :+1:

2x3090 Vs A6000 benchmarks, coming up by next Monday :+1:

4 Likes

Put 'em into an epyc board. Sixteen 4.0 lanes for each card. And works for sure (one bloke I know runs EIGHT 2080 into a ROMED8-2T, having bifurcated the last slot).

1 Like

Thanks!! :smiling_face_with_three_hearts:

1 Like

Mhh I mistakenly assumed that the A5000 was just an A6000 with half the VRAM, like it was for the previous gen quadro 6000 and 8000.

It’s not so. My bad, it was obvious since their TDP. Almost 50% less tensor, int and fp perf with respect to the A6000. Given that A6000 is slightly inferior with respect to 3090 in real-world benchmarks (but we’ll wait for a confirmation from Sanyam), I think an A5000 for the same price of a 3090 is not the greatest possible deal, perhaps. Let alone if you lose money over the deal… One should evaluate carefully, since my 3090 turbos, with no auxiliary heatsinks upon their backplate, don’t overheat their vrams (<90C) if I run them at 260/270W (and at that power level they are presumably much beefier than a stock A5000, possibly on par with the A6000).

1 Like

Hi Everyone!

Please Meet ProtonSuper :tea:

5 Likes

Super cool! :sunglasses:

1 Like

I think you are doing good going for the a4000. But mind that comparisons over number of cuda cores and sheer amount of memory are more or less worthless. Why?

  1. Any additional card added over PCIe (e.g. without nvlink) means traffic over the bus. While I think that in terms of speed it won’t make real difference, latency will increase substantially (read this as: if you use just data parallelization, you will be OK. But with model parallelization, that is, real parallelism, this will NOT be ok).

  2. More importantly, what really matters in DL tasks is memory bandwidth. More than cuda cores. Do compare the bandwidth for the a4000 and the 3090. Note that we don’t have benchmarks for the a4000-5000, but we do have them for 3090 and a6000 (and I really look forward to see the ones from @init_27). They show that the two cards are more or less on par, and that despite the 3090 being driver-capped. Its superior memory bandwidth gives it an edge.

So the bottom line is: Multiple a4000s are a relatively cheap and power-efficient solution to run big models using data parallelism, as long as you are not in a hurry.
How much slower they will be w.r.t. the same vram-equivalent 3090s? We will see, but my bet is: substantially.

4 Likes

Update on the benchmarks:

The creator is a lazy ML Engineer who has never worked with Docker and is now being forced to eat green vegetables of learning NGC & Docker-after a lifetime of fast-food for optimised benchmarks. So please expect them by next week :pray:

Ross had pointed out a bug with PyTorch:

(Copy Pasted message from fastai discord):

if you have a card w/ tensorcores and want it to be fast, do not use the conda/pip release packages, use NGC, https://github.com/pytorch/pytorch/issues/50153 … not sure how this has gone unfixed for so long, but the fast GPU kernels aren’t in the packaged binary releases :\

5 Likes

Hi nit_27 Hope you are well and having a fabulous day!
Thanks for a great post, both informative and humorous. A good video always helps solidify the writen theory.
:clap: :clap:

Cheers mrfabulous1 :grinning: :grinning:

3 Likes

Maybe the street prices will go a bit down

Very interesting, @dvachalek. Thanks for reporting back.

Would you please measure which VRAM temps you got under very intensive load? The best way to do this, if you are willing, is to run a miner under windows and monitor the temps with hwinfo64.