Recommendations on new 2 x RTX 3090 setup

Is it true for linux/ubuntu also. No windows at all ?

Does it work for you in linux/ubuntu as well (without installing linux inside windows ?)?

Does it work for you in linux/ubuntu as well (without installing linux inside windows ?)?

I primarily use Ubuntu only and everything works fine, I’m actually not familiar with setting fan/pump speeds as I’ve never needed to do that. I just plugged everything in and it worked.

I have tinkered a little with WSL2 and it seemed ok, but I didn’t have a compelling enough reason to want to switch to that over just running Ubuntu natively.

I run dual SSD’s, one Win one Ubuntu… selection with GRUB. So yes. Works perfectly, based on temperature. I put it linear curve for fans speed … however it never really got higher than 50C on CPU and 40-something on GPUs…

Now in winter it’s less… so fans never ever go 100%. Which thinking about it might not be the most optimal but eh. Would need to experiment … but so far 2 years down the road 0 problems.

Can you set pump speed as well (like based on the gpu temperature if pump goes faster or slower) ? [I assume you first boot to windows to set fan curve and then reboot in linux assuming those fan settings persists in BIOS ?)?

No. The settings are set in BIOS… I’ve never set them in either Linux nor Wins. It’s pure BIOS setting start to finish.

Pump is from EKWB and i think it has a static speed. since the difference in cooling comes from Fan-speeds and rad sizes… not the actual flow itself (when it’s static). My mobo is ASUS x570 Pro

Yaa, thats what is different in msi or evag kingpin (or any other vendors with all-in-one liuqid cooler inbuilt from the factory setting). [MSI GeForce RTX™ 4090 SUPRIM LIQUID X 24G] [] …you can not control those pumps from bios as they don’t expose those, i suppose ?

RTX 4090 has support for fp8, so you can have double model+batch size (and hence same as a6000 provided no loss of accuracy) and probably faster than rtx a6000. Makes sense ?

Pytorch does not support fp8 currently and I believe it’s still in the proposal stage. I wouldn’t count on being able to do much with fp8 for a while. I’m also skeptical that going to fp8 will be similar to when we went to fp16, at least for training. 8 bits can only represent 2^8=256 values compared to fp16’s 2^16=65,536 vs fp32’s 2^32=4,294,967,296. Even today we don’t train models with only fp16, we use mixed precision which uses fp32 for some operations because of the lack of precision with fp16.

Pytorch does support quantization that can leverage 8 bit computation (int8 currently) but only for inference, not training.

Edit: Correction on mixed precision that Ben pointed out and a suggestion to look at The Best GPUs for Deep Learning in 2023 — An In-depth Analysis which has more info on FP8 based on some preliminary work.

@matdmiller Thanks for the information, yaa thats also true, it might not be same. (i9 10980xe + asus x299 sage) vs (Threadripper pro 5955wx + asrock creator), which one is better assuming cost difference is only $200. I am more concerned about mkl libraries support on amd, does it affect training on images or transformer or large language models ?

Do you think (threadripper 5955 + wrx80 creator motherboard which is latest one) would be a better choice than( i9 10980xe + sage x299 motherboard which is outdated one). My main concern is: if support for mkl is needed for preprocessing on images or vedios in computer vision deep learning (say object detection and so on) ?

Mixed precision will cast individual layers to FP32 as needed, but otherwise both the forward and backward pass are in FP16. Hence the need for the gradient scaler. The optimization step occurs on FP32 weights.

Tim Dettmers (8-bit Optimizers and 8-bit LLMs) thinks FP8 will be useful for training, although he says Ada is missing a Hopper feature which might slow them down in relative FP8 performance.

1 Like

Yes and no. Yes, fp8 is a nice thing to have, and much better than int8 (which was basically useless for training).
fp8 will make you spare a consistent amount of VRAM but doesn’t exactly “halve” memory occupation because some stuff has still to be done with 16 or 32 bit precision (as happened with fp16 vs 32).
Be also aware that fp8 is quite unstable with transformers. This has been discussed on Dettmer’s blog. Workarounds exist but they are a bit of a PITA to implement. It’s still very experimental stuff.

The best card, if you can afford it, is the A6000 Ada. fp8 and 48gb. Nowhere to be found, at least in EU, though. No idea about its street price.

Really? How strange. Not the usual Nvidia behaviour.

Ha. Ha.

Summarizing: as of now, no one (AFAIK) has tested fp8 training on Ada cards. For sure, a rtx 4090 is a much more cost-effective card with respect to the A6000 Ampere, but if money is not a problem, I would still go for the latter.
Apart from “true” 48gb vram, it has a much more manageable 2-slot footprint and it’s built to last, with superb build quality and a blower fan which exhaust all the hot air outside the case.
Note also that, for example, for large transformers training, Lambda has found that a 4090 is some 65% faster than a 3090. But then, the A6000 Ampere is itself 34% faster than a 3090 at the same task, due to its unlocked FP32 accumulation mode.
If you consider that the A6000 is a 300W design vs 450W for the 4090, the former is still ahead in terms of performance per watt.
OTOH, as said, a 4090 can be had these days for 1800eur/usd (crappiest brands), while the lowest price I can find for the A6000 at the time of writing this is still 4700eur. It’s a BIG difference.

RTX A6000 is not that behind and also none of these cards support fp8 as far as i know ?

Maybe I introduced some confusion (but Nvidia itself does that as well, with its inconsistent naming conventions).

The Ada Lovelace professional card, presented Sep 2022, is officially named “RTX 6000 Ada Generation”. Note that it’s indeed prone to be confused with the previous RTX 6000 (Turing generation, professional version of the Titan RTX).
Since it has 48gb worth of VRAM and it’s aesthetically identical to the A6000 Ampere, many (including me) make reference to it as “A6000 Ada”.

Such a card is equipped with fp8 computational capability.
As said, no one tested it as of yet, and the price is still unknown.

1 Like

Thanks for the clarification. One question, as you mentioned somewhere not to go for threadripper if working with images (due to mlk libraries form intel). But for video classification or image classification, segmentation or detection etc using deep learning or machine learning its used only during data processing or augmentation which should not affect the training time that much, unless i am missing something ? [I mean what are the use-cases for threadripper in deep learning (other than RL or knowledge graph may be) ?]?..

  1. Do you think (threadripper 5955 + wrx80 creator motherboard which is latest one) would be a better choice than( i9 10980xe + sage x299 motherboard which is outdated one). My main concern is: if support for mkl is needed for preprocessing on images or vedios in computer vision deep learning (say object detection and so on) ?

I said that, but time keeps passing.

If you have to build a workstation-grade system right now, you are almost forced to go for a TR (preferably, Pro).
Common desktop processors do have the computational beef required for AI (and even more), but don’t support more than 128gb ram, and that’s a big limitation. I don’t like the absence of ECC, also.
The present Intel HEDT/Xeon platform, OTOH, is incredibly old (X299 and C422), and the processors aren’t a match for their counterparts from AMD.
Yes, they are still better with MKL, but you cannot buy an obsolete whole platform just for the sake of MKL.

Buy a TR pro or wait for the Xeon Max platform. It’ll be launched within a few months, I think.

Something in between would be a TR (non-pro) on TRX40 with a maximum of 256gb unbuffered ECC (no LRDIMMs or persistent memory).