What GPU do you have? (poll)

@maw501

That motherboard is capable of driving multiple GPUs. I would need more info on your specific build to help try and answer your question with a good degree of confidence. If you post a detailed list of components (pc part picker) than I could evaluate it against what I know. I’ve only ever built 2 machines so I’m by no means an expert but I have learned a lot from building my deep learning rig. I would put your build into PC Part Picker as it helps catch some issues with incompatible parts.

Some things I would look at are:

  1. You need a minimum of 8 pcie lanes per card. You can check how many pcie lanes your CPU has by googling your cpu and clicking on the ark.intel.com link and looking under “Max # of PCI Express Lanes”. Most medium to high end CPUs have at least 28 lanes so you should be fine. If you want to get every last ounce of performance you’d want 16 lanes available per card which means at least 40 lanes available on your CPU. I have seen discussions on how much of a difference 8x vs 16x lanes make for deep learning but I haven’t seen any numbers showing it makes a huge difference. Your motherboard chipset and M.2 will probably take 4 lanes each so keep that in mind. In the back of your motherboard manual it has block diagrams on how your PCI express lanes are divvied out. (https://dlcdnets.asus.com/pub/ASUS/mb/Socket2066/WS_X299_PRO/Manual/E14486_WS_X299_PRO_Series_UM_V2_WEB.pdf)
  2. Look at how your graphics cards are cooled and how big/tall they are. Most higher performance consumer grade GPU’s take up 2 slots, but some take up more than that. Some motherboards have a larger spacing between their GPU slots. Yours has a pretty common 1 slot in between your GPU slots so if your top card is larger than 2 slots, you won’t be able to fit another one in slot 2. You do have another lower PCIe x16 slot but I’m not sure if your motherboard would like having a second GPU there instead of slot 2. I don’t know how much it actually matters but it’s not what most motherboard manuals recommend. The other thing you need to look out for is cooling on your GPU’s. For example I have 1 water cooled 1080ti and 2 air cooled 1080ti FE cards. The water cooled one has a fan on the bottom but I’ve never actually seen it run. The 1080ti FE cards have a fan at the bottom right which will be running at high speed when you are performing training. I originally had my 1080ti FE cards right next to each other and the top one was getting VERY hot because it didn’t have enough spacing by the fan to get good airflow. It may work for some people having 2 cards back to back like that but my card was hitting over 90C and i believe it was throttling. I have a 1050 card between my 1080ti FE cards. It is a short card and doesn’t block the fan on 1080ti card that is above it. I only use my 1050 for driving my monitor so I don’t chew up valuable memory from my primary 1080ti cards that I use for DL. The 1050 never gets taxed so it doesn’t matter that its fan port is blocked.
  3. Before moving my cards to their current configuration, i had a 1080ti that sat over my M.2 slot. I was getting high temperature warnings on the M.2 drive. Your motherboard has a M.2 heatsink so it may not be a problem for you. I moved my 1050 into a slot where it wouldn’t cover my M.2 drive and the warning went away. Depending on which M.2 slot you use, it will be covered if you have a GPU in slot 2.
  4. Is your power supply sufficient to handle 2 cards and does it have enough power cords to power 2 GPUs. I have a 1,000W PSU because I built my rig during the Crypto Boom and it was the largest one that I could get without waiting for a backorder which was quoted as weeks/months lead time. I wanted at least 1,200W but I couldn’t get one. I think my PSU is borderline large enough and is something you should check.

Something to note - My water cooled GPU in slot 1 blocks some of the ports on my motherboard that are right above it. This makes it so I can’t have all of the jumpers in my case connected so not all of the ports on my case work. This was annoying but thankfully not a show stopper for me. I also have to remove my GPU in slot 1 if I want to add/remove RAM because the RAM clips wont open when the GPU is installed. This is also annoying but not a show stopper.

I loved building my DL rig and love having it. I spent a lot of time researching everything to try and make sure it would work when all of the parts arrived. I started off using AWS P2 instances and having my own rig w/ 1080ti’s substantially faster and I no longer have $100-300/mo AWS bills.

Hopefully you and others find this information useful!

4 Likes

Hi @matdmiller,

Firstly, this is a wonderful response - thank you so much for taking the time to reply in this manner, it’s really helpful. I’m not sure I can do your post justice given you’ve mentioned many things that are the moment are way over my head so perhaps I can just ask you a few follow up questions before diving into any more details.

What was your experience before building your own machines and do you think it’s reasonable for someone who has never done this to do on their own by self-teaching? I do not know a lot of the specifics of the things you’ve mentioned though I luckily have an old PC I could experiment on taking apart and re-building as practice but clearly the big risk is that I mess something up with my current set-up, or worse, break something. I’d prefer as realistic/honest an appraisal as you can give.

Quick point on temperatures: this is basically the reason I’m asking as I want to add a second GPU but my current GPU already hits 84C (the max) and so I think I’m going to need to upgrade my case and cooling system. It was an oversight on my behalf as the PC is only 6 months old I just didn’t (naively) give this part much thought. :disappointed:

Regarding my spec, it’s from pcspecialist and these are perhaps the relevant bits:

Case CORSAIR CARBIDE SERIES™ 200R COMPACT GAMING CASE
Processor (CPU) Intel Core™ i7 Eight Core Processor i7-7820X (3.6GHz) 11MB Cache
Motherboard ASUS WS X299 PRO: ATX, USB 3.1, SATA 6 GB/s
Memory (RAM) 64GB Corsair VENGEANCE DDR4 3000MHz (4 x 16GB)
Graphics Card 11GB NVIDIA GEFORCE GTX 1080 Ti
Power Supply CORSAIR 1000W RMx SERIES™ MODULAR 80 PLUS® GOLD, ULTRA QUIET
Processor Cooling INTEL SOCKET 2011/2066 STANDARD CPU COOLER
Thermal Paste STANDARD THERMAL PASTE FOR SUFFICIENT COOLING

The case is the main limitation I think as it’s quite small which I think is causing the temperature issues. The CPU appears to have 28 PCI express lanes and I did get 1000W power supply with one eye on another GPU at some point. When I click on pcspecialist to upgrade my order (they sadly don’t do case upgrades) and go to add a 2080 ti I don’t get a warning for the power supply compatibility with a 1000W (it estimates I’d need 886W including a 20% buffer). Though it does say “please note that you cannot enable Nvidia® SLi or AMD® Crossfire™” if you have two different GPUs - not sure what this means.

So given pcspecialist don’t do case upgrades I need to either do it myself or find a pc shop I’m happy know enough about my set-up and cooling to carry out the upgrade - ideally not costing the earth.

Would love to hear your thoughts,

Mark

Here’s the results of the poll:
https://andrestorrubia.typeform.com/report/iw1Rhl/kxJG0FnuRRfkC8Xr

4 Likes

I think what you’re wanting is:

  1. Solve cooling problem / transfer the “guts” of your current computer to a new case to fix cooling
  2. Add a second GPU
  3. Can you do this yourself

Let me know if this is not the case.

Your current case certainly isn’t the best one I’ve seen for airflow but I’d be a little surprised if you couldn’t get it working to acceptable levels. How many case fans do you have and which way are each of them blowing? Linus Tech Tips on youtube has several videos on case fan testing. I would start with watching those. I would have expected a PC builder to set this up right but maybe they didn’t. Are there more case fan mounting slots that you could add more fans? I couldn’t find what was included by default on your current case.

As for mounting multiple GPU’s, your motherboard can handle multiple GPU’s. Ideally you want space by where your GPU fans are to allow for proper airflow. I have seen plenty of builds where they have blocked the GPU fans but I when I did that with my GPU’s, i was not happy with the temperature results. Which 1080ti do you have? There are many variants from different manufacturers. I need more information to help answer this.

As for the question: Can you do this upgrade yourself? There are a ton of great online resources about building your own machine. Youtube would be a good resource. I personally like Linus Tech Tips on youtube and he has several building a pc guides. I believe you already have your PC built so it should be much easier to know how it goes together as you’ll have to take it apart first. I found it pretty straight forward to build my own PC after watching Youtube videos. If you’re not comfortable with doing it yourself than I would expect there is probably a local PC shop that could do the work for you at a reasonable cost. A lot of PC gamers build their own PC. If you have a gamer friend that has built their own computer you could ask them to help you out. I probably dedicated about 20-30 hours of my time building my box from research on what parts to buy, learning how to do the build and then actually building it and getting all of the software set up. Building a PC is not something you’ll be able to do in an afternoon if you’ve never done it before. I started off with a basic understanding of what all of the different components function was and how they worked together. My total experience with building PC’s prior to building my own rig for fast.ai was watching a friend build one over 10 years ago. I believe it is certainly something you can do yourself as long as you are committed to spending the time learning how to do it properly.

A few notes if you decide on doing this yourself:

  1. Pay attention to cable management. It does not need to be perfect by any means but if you put some thought and effort into it, it will help with cooling and maintainability.
  2. Pay attention to mounting your CPU to your motherboard and cooler to your CPU if you end up having to take that apart. I don’t think you should have to take that apart to get your motherboard out of your case, but if you do that is probably the most delicate thing you’ll have to do. Once you’ve watched enough guides on how to do it, it’s pretty straight forward. If you don’t do it right and don’t handle the parts correctly it wouldn’t be hard to permanently damage them.
  3. Sometimes you have to push a lot harder than you’d expect to get connectors, ram and video cards plugged in. If it doesn’t go in easily, triple check that you’re doing it right and then just push harder. Most things are designed so they can only be installed one way. For example, there is a notch in your ram that won’t let you install it backwards. That’s what I’m talking about when I say “triple check” you’re doing it right.
  4. Note the order in which the guides tell you to install things. This will make your life easier.
  5. Read the instructions. You probably don’t have the instructions readily available as your PC was built by someone else, but you should be able to find them online pretty easily from the manufacturer with a little bit of Googling. Some people prefer to jump in to things without reading instructions, but I would not recommend that with building a computer for your first time.

Don’t worry about the fact that your GPU’s are different and are not SLI compatible. This is only needed for gaming. If you had 2x2080ti’s you could set up NVLink between them which I read helps with training on multiple GPU’s simultaneously, but you will be just fine without that.

Here are some helpful links that I would suggest you read:




http://www.fast.ai/2017/11/16/what-you-need/

Hopefully that helps!

2 Likes

My organization is planning to build a deep learning rig as its more cost efficient as compared to already available builds in market like 1 and 2 i have suggested them the Gold/Silver/Bronze buckets. Need To know all of your views.After looking at the comments i am thinking to add 2080ti in my gold plan.Just little bit skeptical about the compatibility issues with the components. @init_27, @antorsae , @matdmiller need to know all of your views.

If anyone can post the pcpartpicker link of a multi GPU setup that they have build out then it would be helpful. I am bounded with a budget constraint of 5000$-6000$

This is the Pcpartpicker link : https:/pcpartpicker.com/list/fJkTtg

Some comments on your lists:

  • Jeremy has suggested using Intel CPU’s because of deep learning library optimizations for Intel CPU’s.
  • 1,600W power supply is overkill for 2 GPU’s. 1,000W is enough
  • I’ve talked a lot about GPU airflow in other posts and your build does not address the issues of blocking the fans. Please read through those posts. FE cards get a lot hotter if the fans are blocked.
  • I would get a full tower case as it will be easier to work in and probably no more expensive. Make sure it has plenty of fans that come with or at least has the ability for you to add more fans if it doesn’t come with many.
  • Make sure if you are using water cooling / radiators that your case can fit them.

Here is a link to my build: https://pcpartpicker.com/list/N6mM6s

This is what I would do differently:

  • I would have gone for an i9 CPU and corresponding i9 compatible motherboard as I am bottlenecked at times with my current CPU when working with image data on multiple GPU’s.
  • The case I picked functionally works well but it is gigantic and I’m sure there are other more compact cases that would be better. It works well, it just takes up a lot of room.
  • 3 GPU’s instead of 4. I haven’t run into a ton of cases, with the exception of some kaggle competitions where I use my 3rd 1080ti. If you are going to use this with a monitor I would get a low powered GPU to drive it. If you are going to use this as a server, then it’s not necessary.
  • Bigger NVME drive. I have hard drives for my cold storage, but it takes some maintenance to move data around and is something else for me to do. If I had a bigger NVME drive i wouldn’t do this as much. If you aren’t working with big image datasets than your drive size will likely not matter nearly as much.
  • Only 1 HDD. I was originally running my machine in Unraid and I decided to move away from that and I don’t really need the second HDD now.
  • I would definitely get a 2080ti, but it was not available when I built my machine.
4 Likes

Can you point me where Jeremy recommended Intel CPUs? This may be the case when training on CPU, but if you’re using GPU for training I see it was a moot point.

From personal experience I can vouch for the Threadripper (I have the 16-core/32-threads one) and it becomes very handy when doing a lot of on-the-fly augmentations, second that with 64 PCIe lanes and 128 Gb ram max.

If you’re not going the Threadripper route many intel mobos as well as AMD ones (provided you have one processor ending in -G) have support for on-board graphics in which case you can use the onboard card for display; although personally I run mine w/ 2 x 2080 Tis and when Im not using it I do sudo service lightdm stop which will kill the graphic desktop and free all GPU memory for deeplearning; but even if you don’t do this I can still watch 4K movies with that computer while it’s training on both GPUs. :joy:

1 Like
2 Likes

Thanks. Only comment is that if your workload is multi-thread friendly you’d likely find multiple core/threads provide much gains vs. what optimization would do (for DL or scientific programming I’d go with more cores rather than X% faster single-core performance).

1 Like

We’ve often found CPU performance the restriction on model training time, due to data augmentation and jpeg decoding. With tensor cores this is quite a big issue.

3 Likes

Hi @matdmiller - thank you again for a great reply, much appreciated.

FYI - I have the Zotac Geforce GTX 1080 ti, like this with two fans on it. Looking inside my case I only appear to have two case fans (one front which is intake I think and one exhaust at the back) but there are attachment points for two at the side, two at the top and one on the bottom. However it’s already looking a little crowded in there so I’m not sure how much difference more fans would make?

To try to keep this simple (and not take much more of your time) I think we can boil it to the following:

  • Do you think adding a second GPU and more case fans could work for this case/set-up? I know it gets a little complicated in terms of wanting positive pressure etc… when trying to get the best airflow. This is something I know little about.

Depending on your response to this I will go away and do as much research as I can before making a call on doing it myself or taking it to a pc shop.

Thanks again and thank you for the links you provided,

Mark

I agree that multi-core would in many cases out perform library optimizations alone, however with a high core count i9 you can have both. I suggested an i9 vs another intel CPU for being able to drive 2x1080ti’s + a 2080ti like he said he wanted in his “Gold” option. I’ve personally run into bottlenecking with my current i7 CPU and I would have gone with an i9 if i knew then what I know now. My i7 is far from top of the line as far as i7’s go, but it was the best price to performance one with 40 lanes that I could find at the time. CPU performance was not emphasized nearly as much in the fastai forums when I was building my PC, but it is very important as Jeremy said for fp16 which would come into play if he got the 2080ti plus driving the other 1080ti’s. Threadripper is certainly less expensive than i9 so it’s a balance on what you want to do. I don’t have benchmarks on which is better, or by how much in deep learning workloads, i’m just going off of what Jeremy recommended.

Turning off your display manager is certainly an option to free up GPU ram. I find the display manager in Ubuntu to be finicky if I mess with it so I try and leave it alone as much as possible. This is partly due to my GPU i use for my display not being in slot 1. My GPU I use for my display is sitting at 1GB of ram used at the moment driving my 4k display with some browser windows open. That is a substantial amount of memory to be taken up if it is needed for training. You can almost always decrease batch size and it will work but if you have another means of driving your display it is 1 less thing you have to worry about (cuda out of memory errors). In addition, I mentioned that it would only benefit him if he was planning on using the display while training i.e. not as a server. I don’t think turning off the display is an option if you wanted to use jupyter notebooks on that display. If you were only running training via scripts it wouldn’t matter really because you could launch them from the terminal.

Integrated GPU is an option but both your CPU and motherboard need to have the capability. None of the high core count i9’s have integrated graphics. I didn’t think that threadripper had integrated GPU’s either, but I have not really kept up to speed with Threadripper.

2 Likes

Mark,

Adding case fans, as long as they physically fit and are installed in a direction that makes sense, will almost certainly help. They are cheap and pretty easy to install so may be a worth-while experiment to run even though it may not solve your issue. It is cheaper and easier than your alternative. It’s not possible for me to know for sure if it will definitely solve the issue or not.

As for your current GPU - It has a different style of fans than mine. I know these have different airflow characteristics than my GPUs. That GPU will stay cooler if you don’t have a card immediately against it, but because its airflow characteristics are different than mine, I can’t tell you if you will be OK adding another GPU directly against it or not. I just don’t have any experience with that style of GPU cooling. If I had to make an educated guess, I would think that your current GPU will be too hot w/ another GPU directly against it while running deep learning workloads. That is probably the most common style of GPU fans though so maybe someone else could answer that question who has a similar style GPU.

Getting a new case with enough properly installed fans and a new motherboard with larger spacing between GPU’s will solve your heat issue. Your setup is definitely not optimal for multi-gpus with respect to thermal management, but I really don’t know if it will be good enough once you add case fans and your additional GPU or not.

Thanks,
Mat

Anyone using water cooled GPUs? my 1080 ti runs at 82 Celsius. Any ideas on bringing down temp

Thanks @matdmiller.

Looking a bit more inside your earlier comment strikes me - the first GPU does indeed occupy two slots so I’d have no option but to put another GPU into slot 3 which has a bit of a gap it seems (I don’t want to buy a new motherboard) - so it looks like I might need some research as to whether skipping slot 2 is an issue or not.

Thanks for all your help, I’ll report back with what I decide.

Mark

1 Like

I would try adjusting the fan curve first

2 Likes

Hi @matdmiller,

Quick further question if I may (just realising how much info you’ve actually given me now I’ve started googling :slightly_smiling_face:): what does PCI Express 3.0 x 16 slot (x4 link) mean? I think this is my expansion slot 3 (where I’d hope to put the second GPU) but I’m a little confused what the ‘(x4 link)’ bit means.

Many thanks again,

Mark

Blower type gpu would be better for multi-gpu setup if water cooling is not available.

1 Like

It means that the slot is sized for x16 devices to be able to plug in, such as a graphics card, however behind the scenes it only has the bandwidth of an x4 link. There are x2, x4 x8 and x16 slot sizes and link sizes. You can run a x2, x4, x8 or x16 link over an x16 slot size, but you cannot run a x16 link over an x4 slot size. Just because your slot size is x16 doesn’t mean it has x16 of bandwidth available. From everything I’ve read you need a minimum of x8 bandwidth to be able to run a GPU for deep learning meaning x4 link bandwidth is not going to work. On some motherboards, some of the slots have a fixed maximum bandwidth and on others the bandwidth varies per PCIE slot based on what else you have plugged in. Most motherboard manuals show you this information. If you install PCIE cards as shown in your motherboard manual, you can generally expect it to work as described in the manual. I am sure there are some motherboards that will work with configurations not explicitly specified in the manual but the only way to know for sure is testing, experience or having a deeper understanding of the architecture than the average PC builder. Also, just because it “works” does not mean it will perform optimally or to your expectations.

2 Likes

Thanks @matdmiller - that basically kills the 2 GPU idea for now then. :disappointed:

I suppose a quick win would be to use a second GPU to drive my monitor (I have a crappy 2GB GTX 650 ti from my old PC that ought to be okay I think) as well as trying a few extra case fans. You mentioned that you drive your monitor from a GPU not in slot one - is this straightforward enough to do?