Making your own server

machinethink · September 3, 2017, 9:28am

Looks like a good build to me. Personally I’d get DDR4-2400 instead of 3000 and save a few bucks.

I’m not sure if the CPU cooler comes with thermal paste; if not, you should get some. The thermal paste goes between the CPU and the cooler.

superMDguy · September 3, 2017, 12:18pm

Ok, thanks. I switched the ram out for 2x8gb of Patriot Viper Elite DDR4-2133. The cpu cooler didn’t mention coming with thermal paste, so I’ll probably just buy this.

Dat · September 5, 2017, 7:40am

Hi everyone,

I’ve been training some custom models using tensorflow on my laptop, even when I’m using the built-in nvidia 960M it takes 10 to 24 hours to train a moderate size datasets. To speed things up I decided to build my own DL rig, here’s the pcpartpicker:

https://pcpartpicker.com/list/N84Kjc

The parts were picked following Slavv’s guide: https://blog.slavv.com/the-1700-great-deep-learning-box-assembly-setup-and-benchmarks-148c5ebe6415

Unlike Slav, I plan on using 1 GPU (GTX 1080 TI) until I can justify the cost of buying and building a multiple GPU system; even then, if I need more than 1 GPU I think I will sell my old rig and buy a new one. Is it a good idea to use the i5-7500 which only has 16 PCIe lanes? Would this be a bottleneck for the 1080 TI and the NVMe m.2 ssd? (Assuming they share bandwidth)

Sorry if this was already answered in this thread. I read this thread for 3 hours but I couldn’t get 2/3 of the way through.

Thanks in advanced.

Dat

machinethink · September 5, 2017, 8:08am

I don’t think PCIe bandwidth is worth worrying about if you’re using only 1 or 2 GPUs. You won’t be using the full bandwidth while training, since the GPU cannot do its convolution computations that fast anyway.

Dat · September 5, 2017, 8:10pm

Slav did mention (and some citations) that people were experience bandwidth bottlenecks, here is a text from Slav’s blog:

Edit: As a few people have pointed out: “probably the biggest gotcha that is unique to DL/multi-GPU is to pay attention to the PCIe lanes supported by the CPU/motherboard” (by Andrej Karpathy). We want to have each GPU have 16 PCIe lanes so it eats data as fast as possible (16 GB/s for PCIe 3.0). This means that for two cards we need 32 PCIe lanes. However, the CPU I have picked has only 16 lanes. So 2 GPUs would run in 2x8 mode (instead of 2x16). This might be a bottleneck, leading to less than ideal utilization of the graphics cards. Thus a CPU with 40 lines is recommended.
A good solution would be an Intel Xeon processor like the E5–1620 v4 ($300). Or if you want to splurge go for a higher end processor like the desktop i7–6850K ($590).

Following that logic, if you were to run 3 or 4 GPUs, you would not see linear improvements in speed on a single machine since you would be limited by PCIe bandwidth. Even with 40 lanes (Intel Xeon: https://ark.intel.com/products/92991/Intel-Xeon-Processor-E5-1620-v4-10M-Cache-3_50-GHz) it would run at 1x16, 3x8. So ideally, 40 lanes would be good for a 2 GPU system.

For the 16 pcie lane that I’m considering buying, since I’m only going to run 1 1080 TI on it, I am only concern if the 16 lane is shared with the NVMe drive and if that will be a bottleneck. I guess I got to do some more research.

EDIT: I think I found the answer, http://www.tomshardware.com/answers/id-2943191/nvme-ssd-affect-gpu.html
This means my NVMe SSD won’t be a bottleneck for my single GPU for a 16 pcie lane CPU. However, if I plan on using multiple GPUs then I would need a CPU with more than 16 pcie lanes, otherwise it’ll go into the 2x8 (for 2 GPUs) or 1x8 and 2x4 (for 3 cards). Reference -> https://www.pugetsystems.com/labs/articles/Z270-H270-Q270-Q250-B250---What-is-the-Difference-876/

EricPB · September 5, 2017, 11:15pm

Hi everyone,

Just thought I could share an update in this thread since posting the specs of my DL rig early May 2017.

Roughly: a 2015 i5-4690K (4 cores) Intel CPU + air-cooler, 16gb DDR3 RAM, Samsung SSD 500gb (not NVMe) + 3tb HD, a GTX 1080Ti, all into a 60$ Corsair box with a 650watts PSU.
Dual-boot Win 10 and Ubuntu 16.04, with a SSD swap file of 50gb (to complement the ‘low’ 16gb RAM, a must-have imho).

Since then, I’ve completed Part 1 and most of Part 2 (I focused on sections I found relevant to my business/job prospect, and skipped the rest for later use).

More importantly, about 3 months ago (mid-June 2017), I started doing Kaggle competitions “seriously” because (1) @jeremy highly recommends it for fast/practical learning and (2) it’s a metric used by a growing number of companies/recruiters if you don’t have a PhD from MIT/Stanford.
You can see my Kaggle results here, approx Top 25% on average (ie. better than Top 50% that Jeremy mentions in his lessons), plus a Bronze Medal.
https://www.kaggle.com/ericperbos/competitions

I’d like to make one point very clear, so you don’t waste money on your rig in a useless fashion:

- Having a better rig would NOT have resulted in better Kaggle results.

An AMD Ryzen ThreadRipper 1950X with 16 cores and 32 threads (999$) and a GTX Titan Xp (1200$) would have make little difference, not to mention an even more overpowered rig with multi-gpu (and its pain-in-the-arse setup, check Tim Dettmers blog).
If you can afford it, good on you but…

Once you have a nVidia GPU with 8gb vram, a decent CPU with 4 cores and a SSD to handle the OS/data/swap, you can handle any Kaggle competition to reach Top 25%.

If you can’t, don’t blame the hardware and look in the mirror

Eric PB

PS: if you think Part 2 is more demanding on DL hardware, you can check the full menu here

machinethink · September 6, 2017, 9:55am

Some quick back-of-the-napkin math:

One epoch of training on MobileNet using Keras and TensorFlow takes 2 hours on my machine (one GTX 1080 Ti). That is 1.2 million images in 7200 seconds.

Each image is rescaled (on the CPU) to 224x224 pixels, so that is 224 x 224 x 3 x 1.2M bytes that get transferred to the GPU. Divided by 7200 seconds gives an average of ~24 Megabytes/second.

This means the GPU can only process 24 MB/sec worth of data while training this kind of neural net. My machine isn’t limited by the PCIe bandwidth, but by the amount of computation the GPU can handle.

An x16 PCIe 3.0 connection can transfer ~16 GB/sec (in theory). That means the PCIe bus can move data 600 times faster than the GPU can handle when training this particular neural net.

The maximum host-to-device bandwidth of my 1080 Ti is about 12.5 GB/sec. That is just copying data from the CPU to GPU (or back). If that’s all you’re doing, or you’re just running compute kernels that can approach this kind of bandwidth, yeah then just 16 PCIe lanes won’t be enough if you have more than one GPU.

But once you start training a serious neural network, your GPU will spend more time doing its computations than transferring data.

EDIT: Just for another data point, I looked at some benchmarks. Training VGG16 on the 1080 Ti takes 128.14 ms for a batch of 16 x 3 x 224 x 224 bytes. Per second that translates to throughput of 18 MB. Not even close to what you need to fill up the PCIe bus.

paul.epping · September 6, 2017, 2:20pm

I was setting up the AWS instance and the very last step (running the script from: http://www.platform.ai/files/setup_p2.sh) did not work. No access granted. Who or what allows me to get access and get the instance up and running?

Dat · September 7, 2017, 7:39pm

Thanks for the breakdown. Interesting that you are running a 1080 TI and your GPU is the bottleneck. I already ordered all of the hardware and I will start building this weekend. I’ll report back with some benchmarks once I start making my way through the course.

At least I know I will need to have a completely new system if I plan on using multiple GPUs if I wish to use the system to the maximum potential.

sshleifer · September 8, 2017, 1:19am

I am trying to get parts for a box that starts off with 1 1080 Ti and add a second one later if it feels helpful

Apparently, the motherboard in my build wont allow that, however

It turns out the MSI Z270-A PRO has 1 x16e PCIe slot, not 2. It can hold an extra GPU in a x4e slot, but performance will be constrained. Do not buy this if you plan on using multiple GPUs!

Anyone know which motherboard I should get, and what other parts should change to allow this flexibility?

PCPartPicker part list: https://pcpartpicker.com/list/JJt8Gf

tfolk · September 8, 2017, 3:13am

If you are wanting multiple GPU, you are probably better off going with an intel 6850k. Here is what I just finished building:

https://pcpartpicker.com/list/Lgsz3F

I built it about a week ago and am very happy. I don’t yet have multiple GPUs, but if I ever did get another, it has 2 x16 PCIe slots.

eugeneware · September 8, 2017, 7:59am

I’m thinking about getting a multiple GPU box and ideally something that might be able to fit into a small form factor case too and be relatively quiet. I’m thinking about two GeForce GTX 1080 TI Minis, and pairing them up with an Intel i7 that has 40 PCIe lanes.

I’m not sure if I can find a case/mini ITX solution that will be able to support the i7. The intel i7 6850k that Tyler recommended looks great. Just need to see if I can find a motherboard and power supply that would work in a SFF PC.

Does anyone have any experience building a dual GPU small form factor DL rig?

machinethink · September 8, 2017, 8:33am

I got this motherboard: https://pcpartpicker.com/product/2Phj4D/asus-strix-z270-e-gaming-atx-lga1151-motherboard-strix-z270-e-gaming

You won’t need to change anything to your parts list, except you don’t need the WiFi thing since the Strix 270-E comes with WiFi.

eugeneware · September 16, 2017, 1:29pm

To answer my own question. I found this commercial build of a small form factor deep learning rig here.

I’ve actually decided not to go this direction, but to build a full tower system that I can expand and add more cards to over time. But I thought I’d share this link in case others pursue this direction of system design.

You can even get a neat little carrying case for it!

msgrasser · September 22, 2017, 3:28pm

Hi all, long-time lurker (thank you!), first time poster (sorry!) here.

I’m in the process of putting together a small, speedy DL rig, and have a setup designed that I feel somewhat content with at the moment. The goals are:

Sub-$2k price point
Small-ish (ideally mATX form factor, if that is even possible)
Quiet-ish (prob going to sit in my office or my home, so less distraction the better)
Expandable (PSU + Motherboard) to 2 GPUs if needed later, fits at least one 1080ti-based card for now
Expandable (PSU + Motherboard) to 64GB of RAM if needed later

Here’s what I’ve come up with: https://pcpartpicker.com/list/sxxHCy

I’ve measured clearances (GPU size, CPU cooler height, PSU fit), and it looks like they all check out. I figured before pulling the trigger on these purchases I’d check with all you folks more expert than me to see if I’m missing any critical component or have overlooked any critical flaw. Would love to hear your feedback – thanks in advance for any advice!!!

p.s. once this is built I’ll plan to publish a more detailed blog post on the rationale and build, since I haven’t seen anything to date at this size and price point (plz feel free to DM me if you have seen something similar so I can link and/or not be redundant with others!)

superMDguy · September 22, 2017, 9:05pm

Since you’re getting an aftermarket CPU cooler, you’ll need thermal paste. Also, you mentioned wanting to expand to 2 GPUs in the future, but your CPU supports up to 16 PCIe lanes, which could cause a bottleneck. You can read about that here. Other than that, it looks good to me.

machinethink · September 23, 2017, 11:54am

As I’ve pointed out before, the 16 PCIe lane bottleneck isn’t really something to worry about – especially when your goal is to stay below $2k (with one GPU).

I would make sure there’s enough cooling going on, especially if you’re going to stick everything in a smallish case.

jongold · October 3, 2017, 5:38pm

Looking to put together a rig in the next few weeks and pretty conflicted over the advice I’ve seen here & other places.

I too would love to keep an upgrade path open — with PyTorch it seems like adding GPUs is super easy — but I can’t figure out how to actually go about that.

What CPUs are there that have > 32 PCIe lanes? I’ve seen:

Xeons
i7 6850K
???

I don’t fully understand the tradeoffs between Xeons & i7s.

Coffee Lake looks as if every processor is limited to 16 PCIe lanes — correct me if I’m misreading.

Other than that, I’m thinking about the other common i7s - 7700K, 7500 etc etc, but I’d like if there was an upgrade path when I’m spending $2k+ on a build…

machinethink · October 3, 2017, 7:00pm

If you want 2 GPUs max then a fancy gaming PC is just fine. If you want the option of > 2 GPUs you’re looking at server-level computers.

The “i7” brand is used for a lot of different things: the 7700K is what you’d use on a gaming PC, the 6850K is for more high-end stuff. Xeons are a non-consumer server CPU with loads of cores.

The 6850K seems like a good choice for a box with up to 4 GPUs.

FourMoBro · October 6, 2017, 2:32am

While the Coffee Lake processors have only 16 lanes, you really get up to 40 lanes as a system when you add in the 24 you get from the platform’s controller hub.