Why you probably want a "Blower" GFX for multi-GPU's

That’s odd – this works fine for me. I have several Nvidia GPUs with one plugged into a monitor, and I am able to manually adjust the fan speed for all of them both from the Nvidia driver GUI and command line. I am not sure whether this will work if the monitor is plugged into an AMD card.

If you post your /etc/X11/xorg.conf can take a look.

1 Like

Sure thing! Thanks.

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 415.27

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    Screen      1  "Screen1" RightOf "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor1"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1080 Ti"
    BusID          "PCI:3:0:0"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1080 Ti"
    BusID          "PCI:4:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "31"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    Monitor        "Monitor1"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "31"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

May I ask which are the control strings you use to address the fans?

I use, for example:

nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:0]/GPUTargetFanSpeed=20"

And it works. But if I use:

nvidia-settings -a "[gpu:1]/GPUFanControlState=1" -a "[fan:0]/GPUTargetFanSpeed=20"

or

nvidia-settings -a "[gpu:1]/GPUFanControlState=1" -a "[fan:1]/GPUTargetFanSpeed=20"

these don’t work.

The monitor is attached to gpu:0

Observe the GUI for both cards:

and:

As you may see, the second card lacks the control bar. The same stands for the overclocking control. Conversely, I am able to successfully set the power limit for both of them via nvidia-smi.

I see two differences with my xorg.conf file. In the “Screen” section, mine reads like:

Section “Screen”
Identifier “Screen0”
Device “Device0”
Monitor “Monitor0”
DefaultDepth 24
Option “Coolbits” “4”
SubSection “Display”
Depth 24
EndSubSection
EndSection

Differences are that:

  1. I do not have the Option “AllowEmptyInitialConfiguration” “True” line
  2. I have Coolbits set to “4” rather than “31”. “4” is all you need for fan control. “31” enables overclocking and other features that I would leave out.

Suggest editing you xorg.conf (back it up first!) to remove the lines Option “AllowEmptyInitialConfiguration” “True”
then change Coolbits to 4. You need to do this for every Screen entry in your config file.

1 Like

The scripts I use to turn the fans (all of them) up and down are in my .bashrc as follows:

alias nvidia-fanup=‘export DISPLAY=:0; nvidia-settings -a GPUFanControlState=1; nvidia-settings -a GPUTargetFanSpeed=70’

alias nvidia-fandown=‘export DISPLAY=:0; nvidia-settings -a GPUFanControlState=0’

You should edit the “GPUTargetFanSpeed” in the nvidia-fanup alias to reach your target fan speed. The nvidia-fandown alias returns you to the default fan state.

1 Like

No luck. The second GPU fan still doesn’t get to be manually controllable, neither via gui, nor via CLI… I took away “AllowEmptyInitialConfiguration” and did set the coolbits to 4.

What could it be? :face_with_raised_eyebrow::thinking:

Furthermore, look at this:

In this screencap, I did set manually the fan at 100%. The temp is 86C and still increasing (then I stopped gpu-burn…)

Note that the cards are spaced by a good 2 centimetres, and they catch the airflow from the 2 big fans you see in the photo.

Hmm, I’m not sure what the issue is with the fan speed. My last suggestion would be to confirm that your BusIDs are correct – could they have changed when you added/removed the AMD card? You could check by deleting xorg.conf (after backing up) and rerunning:

  1. nvidia-xconfig --enable-all-gpus
  2. nvidia-xconfig --cool-bits=4

At 100% fan speed, the 1080Ti blowers should be able to handle this configuration with temps <85c for training. I don’t have experience with gpuburn, so can’t comment whether your temps are appropriate.

Removing the lower GPU’s backplate segment that overlaps the upper GPU’s intake fan will net you 2-3 more degrees. Removing the PSU shroud could also help the cards breathe. However I think these will be minor benefits – your main problem is that you can’t set the fan speed.

You could drill out the side cover and put a couple of fans blowing directly onto the GPUs. Note the G12 +kraken x52 is quite tall, may want to get dimensions to make sure will fit between GPUs or below the bottom one

They are: indeed, if I attach the monitor to the other 1080ti, I can manualy set its fan, but I lose control over the other’s.

It is not much different than training an unfrozen network. I just reach these temps in a few minutes, while when training a nn it takes some 10 minutes… I actually hit 91C while training an unfrozen resnet101.

I concur :frowning:

My side panel is tempered glass, but I could buy a slot bracket to hold 1-2 additional fans. I doubt, however, that this will give me more than 4-5C… Consider that the FE versions are completely encased, so such fans won’t blow over any heatsink.

Coming to the liquid cooling system, I’ll use the g12 with a couple of corsair H55, which are compact (120x120). The cards are at overall distance of 3 slots, so the g12 should fit.

  1. Not sure about the model of PC case you have but on mine, Corsair Carbide 200R, the side panels are reversible as the case is symetrical when it comes to panel attachements/scews.
    So you may be able to switch your glass panel from the “open side” to the “closed side”, then drill some holes in the plain cover, though as you said it probably won’t help much.

  2. Another option, probably cheaper than investing in liquid cooling at start, is to get a better case focused on “max airflow” options, typically those cases have up to 8 slots for case fans.
    Mine (approx USD 70) has 4 extra slots, 2 on the side, 2 on the top, so a lot of potential for “negative pressure” fans (the ones that pull air out, lot more efficient than the “positive” ones when it comes generating airflows btw, as it’s a lot easier to suck air out of a container than blow air into it).

  3. Another option, combined with option #2, is to get a better/larger motherboard so your two GFX are not so closely positionned next to each other, like in your picture, and let the airflow do a better job. Of course, there might be some new bandwith issues regarding PCie 16/8/4 limits but will they have a larger impact than a temperature safe-mode ?

I really need to emphasize that “Negative Pressure” is a lot more efficient than “Positive Pressure”, when it comes to airflow management and fans (in a previous life, I worked in Air Conditioning and Climate Control for housing/small shops :sunglasses:).

So if the two fans on the front panel of your case, which are 99% likely to be “Positive” (ie. they try to inject air into your case), were instead located on the top panel of your case in “Negative” mode (ie. sucking the air out), they would be a LOT more efficient for the same power consumption.

As a “real-life” comparison, remember how easy it to suck air out of a plastic container (under-pressure=negative) and compare that to the energy you spend to inflate a balloon (over-pressure=positive) :yum:

1 Like

@EricPB, I believe that you have a point here, but with some caveats:

  1. A founder’s edition card is, basically, an air tunnel. Any part of the card which can dissipate heat is not exposed to external airflow, so additional fans around the cards cannot do much, apart from cooling the backplate just a bit.

  2. Founder’s edition cards (or any other blower design), by the virtue of point 1, are specifically engineered to be stacked tightly, one every two slots. Mine do have one empty slot between them.

  3. Tim Dettmer’s tested a lot of cases specifically for seeing how much a case can impact over blower cards temperatures. He found that between the worst case and the best one there were just a couple of celsius (for the gpus), and that should not surprise us, again for what we said in point 1.

  4. The matter about negative pressure is that the case will start to suckle a lot of unfiltered air from every orifice. This will result in a large accumulation of dust inside the case itself. Moreover, since the blower cards take air from inside the case and expel it out, maybe it would be beneficial to maintain a slightly positive pressure inside the case (not sure, though).

  5. Last but not least, a max-airflow case (say the obsidian 750 or the carbide 400) costs about 150 euros, while two liquid cooler and two g12s cost 60+60+22+22 euros, so we are aligned in term of cost.

Advantages:

  • The cooling will be better than any air solution, and more silent too.

  • I will be probably able to overclock both cards to some 2 GHz and reach 10 Tflops (each) in fp32.

Disadvantages:

  • A water pump can fail. If so, I do not know what happens. My worst fear is the possibility of starting a fire.

  • There could be liquid spillovers, with unpredictable consequences.

More generally, I do not know what the MTBF is for such cheap AIOs.

1 Like

If you plug a 2nd monitor into the 2nd GPU, can you change the fan speed?

If so, consider buying a dummy DisplayPort plug to trick the system into thinking a monitor is plugged in. While you will lose a little GPU RAM, this is a simple solution.

1 Like

Yes, but I noticed I lose computational power, too. Look at the experiments performed by @EricPB (2060 vs 1080): he noticed it too :confused:

The amd 5450 was intended to prevent this.