Making your own server

Hi everyone,

Just thought I could share an update in this thread since posting the specs of my DL rig early May 2017.

Roughly: a 2015 i5-4690K (4 cores) Intel CPU + air-cooler, 16gb DDR3 RAM, Samsung SSD 500gb (not NVMe) + 3tb HD, a GTX 1080Ti, all into a 60$ Corsair box with a 650watts PSU.
Dual-boot Win 10 and Ubuntu 16.04, with a SSD swap file of 50gb (to complement the ‘low’ 16gb RAM, a must-have imho).

Since then, I’ve completed Part 1 and most of Part 2 (I focused on sections I found relevant to my business/job prospect, and skipped the rest for later use).

More importantly, about 3 months ago (mid-June 2017), I started doing Kaggle competitions “seriously” because (1) @jeremy highly recommends it for fast/practical learning and (2) it’s a metric used by a growing number of companies/recruiters if you don’t have a PhD from MIT/Stanford.
You can see my Kaggle results here, approx Top 25% on average (ie. better than Top 50% that Jeremy mentions in his lessons), plus a Bronze Medal.

I’d like to make one point very clear, so you don’t waste money on your rig in a useless fashion:

- Having a better rig would NOT have resulted in better Kaggle results.

An AMD Ryzen ThreadRipper 1950X with 16 cores and 32 threads (999$) and a GTX Titan Xp (1200$) would have make little difference, not to mention an even more overpowered rig with multi-gpu (and its pain-in-the-arse setup, check Tim Dettmers blog).
If you can afford it, good on you but…

Once you have a nVidia GPU with 8gb vram, a decent CPU with 4 cores and a SSD to handle the OS/data/swap, you can handle any Kaggle competition to reach Top 25%.

If you can’t, don’t blame the hardware and look in the mirror :sunglasses:

Eric PB

PS: if you think Part 2 is more demanding on DL hardware, you can check the full menu here

3 Likes

Some quick back-of-the-napkin math:

One epoch of training on MobileNet using Keras and TensorFlow takes 2 hours on my machine (one GTX 1080 Ti). That is 1.2 million images in 7200 seconds.

Each image is rescaled (on the CPU) to 224x224 pixels, so that is 224 x 224 x 3 x 1.2M bytes that get transferred to the GPU. Divided by 7200 seconds gives an average of ~24 Megabytes/second.

This means the GPU can only process 24 MB/sec worth of data while training this kind of neural net. My machine isn’t limited by the PCIe bandwidth, but by the amount of computation the GPU can handle.

An x16 PCIe 3.0 connection can transfer ~16 GB/sec (in theory). That means the PCIe bus can move data 600 times faster than the GPU can handle when training this particular neural net.

The maximum host-to-device bandwidth of my 1080 Ti is about 12.5 GB/sec. That is just copying data from the CPU to GPU (or back). If that’s all you’re doing, or you’re just running compute kernels that can approach this kind of bandwidth, yeah then just 16 PCIe lanes won’t be enough if you have more than one GPU.

But once you start training a serious neural network, your GPU will spend more time doing its computations than transferring data.

EDIT: Just for another data point, I looked at some benchmarks. Training VGG16 on the 1080 Ti takes 128.14 ms for a batch of 16 x 3 x 224 x 224 bytes. Per second that translates to throughput of 18 MB. Not even close to what you need to fill up the PCIe bus. :wink:

3 Likes

I was setting up the AWS instance and the very last step (running the script from: http://www.platform.ai/files/setup_p2.sh) did not work. No access granted. Who or what allows me to get access and get the instance up and running?

Thanks for the breakdown. Interesting that you are running a 1080 TI and your GPU is the bottleneck. I already ordered all of the hardware and I will start building this weekend. I’ll report back with some benchmarks once I start making my way through the course.

At least I know I will need to have a completely new system if I plan on using multiple GPUs if I wish to use the system to the maximum potential.

I am trying to get parts for a box that starts off with 1 1080 Ti and add a second one later if it feels helpful

Apparently, the motherboard in my build wont allow that, however

It turns out the MSI Z270-A PRO has 1 x16e PCIe slot, not 2. It can hold an extra GPU in a x4e slot, but performance will be constrained. Do not buy this if you plan on using multiple GPUs!

Anyone know which motherboard I should get, and what other parts should change to allow this flexibility?

PCPartPicker part list: https://pcpartpicker.com/list/JJt8Gf

If you are wanting multiple GPU, you are probably better off going with an intel 6850k. Here is what I just finished building:

https://pcpartpicker.com/list/Lgsz3F

I built it about a week ago and am very happy. I don’t yet have multiple GPUs, but if I ever did get another, it has 2 x16 PCIe slots.

I’m thinking about getting a multiple GPU box and ideally something that might be able to fit into a small form factor case too and be relatively quiet. I’m thinking about two GeForce GTX 1080 TI Minis, and pairing them up with an Intel i7 that has 40 PCIe lanes.

I’m not sure if I can find a case/mini ITX solution that will be able to support the i7. The intel i7 6850k that Tyler recommended looks great. Just need to see if I can find a motherboard and power supply that would work in a SFF PC.

Does anyone have any experience building a dual GPU small form factor DL rig?

I got this motherboard: https://pcpartpicker.com/product/2Phj4D/asus-strix-z270-e-gaming-atx-lga1151-motherboard-strix-z270-e-gaming

You won’t need to change anything to your parts list, except you don’t need the WiFi thing since the Strix 270-E comes with WiFi.

To answer my own question. I found this commercial build of a small form factor deep learning rig here.

I’ve actually decided not to go this direction, but to build a full tower system that I can expand and add more cards to over time. But I thought I’d share this link in case others pursue this direction of system design.

You can even get a neat little carrying case for it!

Hi all, long-time lurker (thank you!), first time poster (sorry!) here. :slight_smile:

I’m in the process of putting together a small, speedy DL rig, and have a setup designed that I feel somewhat content with at the moment. The goals are:

  • Sub-$2k price point
  • Small-ish (ideally mATX form factor, if that is even possible)
  • Quiet-ish (prob going to sit in my office or my home, so less distraction the better)
  • Expandable (PSU + Motherboard) to 2 GPUs if needed later, fits at least one 1080ti-based card for now
  • Expandable (PSU + Motherboard) to 64GB of RAM if needed later

Here’s what I’ve come up with: https://pcpartpicker.com/list/sxxHCy

I’ve measured clearances (GPU size, CPU cooler height, PSU fit), and it looks like they all check out. I figured before pulling the trigger on these purchases I’d check with all you folks more expert than me to see if I’m missing any critical component or have overlooked any critical flaw. Would love to hear your feedback – thanks in advance for any advice!!!

p.s. once this is built I’ll plan to publish a more detailed blog post on the rationale and build, since I haven’t seen anything to date at this size and price point (plz feel free to DM me if you have seen something similar so I can link and/or not be redundant with others!)

Since you’re getting an aftermarket CPU cooler, you’ll need thermal paste. Also, you mentioned wanting to expand to 2 GPUs in the future, but your CPU supports up to 16 PCIe lanes, which could cause a bottleneck. You can read about that here. Other than that, it looks good to me.

1 Like

As I’ve pointed out before, the 16 PCIe lane bottleneck isn’t really something to worry about – especially when your goal is to stay below $2k (with one GPU).

I would make sure there’s enough cooling going on, especially if you’re going to stick everything in a smallish case.

1 Like

Looking to put together a rig in the next few weeks and pretty conflicted over the advice I’ve seen here & other places.

I too would love to keep an upgrade path open — with PyTorch it seems like adding GPUs is super easy — but I can’t figure out how to actually go about that.

What CPUs are there that have > 32 PCIe lanes? I’ve seen:

  • Xeons
  • i7 6850K
  • ???

I don’t fully understand the tradeoffs between Xeons & i7s.

Coffee Lake looks as if every processor is limited to 16 PCIe lanes — correct me if I’m misreading.

Other than that, I’m thinking about the other common i7s - 7700K, 7500 etc etc, but I’d like if there was an upgrade path when I’m spending $2k+ on a build…

If you want 2 GPUs max then a fancy gaming PC is just fine. If you want the option of > 2 GPUs you’re looking at server-level computers.

The “i7” brand is used for a lot of different things: the 7700K is what you’d use on a gaming PC, the 6850K is for more high-end stuff. Xeons are a non-consumer server CPU with loads of cores.

The 6850K seems like a good choice for a box with up to 4 GPUs.

While the Coffee Lake processors have only 16 lanes, you really get up to 40 lanes as a system when you add in the 24 you get from the platform’s controller hub.

Hi all,

I have been a long-time lurker on this thread and have finally set up my own rig! Thanks to all of you for your ideas and guidance.

I ended up going with the CyberPowerPC SLC3600C from Costco that was suggested by @ai88 in this thread.

This machine was available for ~ $2000 in September. Plus you can get 3% back and a 4-yr warranty if you use your Costco visa. It has great specs for the price point, you can also add a second 1080 Ti GPU down the road. I think this would be a great way to get started.

It was pretty easy to set up and looks great too! I have blogged about it here. Performance is good - it is about 2x faster on training with the mnist /cifar-10 datasets, compared to using Crestle.

1 Like

Nice! The specs look similar to what you can put together on your own for that price, so if you don’t want to go through the hassle of building your own computer, this looks like a good one to buy.

I’m currently building a server myself, and after a lot of research (this thread was pretty useful) have decided on a 6850k and 1080TI, and getting 3 extra gpus at some point down the road (hence the 40 lanes cpu).

Any recommendation on which mobo would be most cost-effective and would both support x16, x16 at first, and later on x8,x8,x8,x8 + nvme? So far the MSI X99A Gaming Pro seems to potentially fit best in terms of cost effectiveness, and the x99 deluxe seems awesome but I hope I can buy something cheaper that meets my needs.

Also looking for recommendations on cost-effectiveness of different coolers, cases, and PSUs.

It seems the X99 is what most people go for (that I’ve seen when I was doing my own research).

As for a case, I really like the Corsair Carbide Air 540. It’s not necessarily the cheapest case but it has lots of room, which makes it easy to install things and helps with cooling.

Hey. I went with a similar build that you are mentioning. Here are the parts I used:

https://pcpartpicker.com/list/Lgsz3F

I have been using this machine for the past 2 months and it is amazing. I dual boot windows and ubuntu and use it for gaming, deep learning, and crypto mining. Handles everything well.

1 Like