Making your own server

One advantage over cloud solutions like AWS is that if you wanted to train on ImageNet, for example, and the person you’re renting computer time from already has the ImageNet data on their drive, you don’t have to download it yourself anymore.

There’s always the potential of abuse, so you’d probably want to restrict what this kind of user could do on your computer (and your home network etc).

Issue would be 1) privacy (perhaps private data) and 2) security (ex. remote code execution via pickle).

if anything, data / code would have to be sandboxed from the host / other prokects.

Hi @taewoo ,
I am in the same boat as you and would like to discuss partnering in this venture if you are open.

@sjt dm me. Or email. My username @Gmail. Com

I just completed a build centered around an ASUS X99 E 10G WS. Nice X99 Mobo with 8 PCIe slots (40 lanes) so can fit 3 GPU cards no problem. No problem booting a 6850K, but I did ultimately upgrade the UEFI BIOS (easy). Probably overkill on the CPU but price on 6850K very attractive now vs a year ago. Volta PCIe cards may be coming eventually but right now 1080Ti 11G probable sweet spot vs. Titan Xp. I decided to get 1x 1080Ti and then either upgrade to one or two more of whatever the volta equivalent of the Titan Xp is, depending on cores and how much DDR5 memory (16GB, please?). I can sell the 1080Ti and probably recoup 1/2 or better of the original purchase price, so will cost me about $300 for the year or so I will have it. I felt that was a better option than a Titan Xp, especially since I’m paying myself. Probably not worth waiting for the coffee lake CPU’s as DL is mostly GPU dependent and early MoBos may be buggy - that’s why I chose X99 instead of X299, etc… Reliability & compatibility are helpful here, one less thing to worry about.

P.S. This thread is excellent. Thanks to all posters for helping me with my build - I consulted here often.

I created a build that’s kind of based on a few pcpartpicker lists I’ve seen around. I’m a total noob at pc building, though, so I definitely could’ve made a mistake. Can someone look over it, and tell me if there are any issues I should fix? Thanks!

https://pcpartpicker.com/user/supermdguy/saved/PNMpbv

Looks like a good build to me. Personally I’d get DDR4-2400 instead of 3000 and save a few bucks.

I’m not sure if the CPU cooler comes with thermal paste; if not, you should get some. The thermal paste goes between the CPU and the cooler.

2 Likes

Ok, thanks. I switched the ram out for 2x8gb of Patriot Viper Elite DDR4-2133. The cpu cooler didn’t mention coming with thermal paste, so I’ll probably just buy this.

Hi everyone,

I’ve been training some custom models using tensorflow on my laptop, even when I’m using the built-in nvidia 960M it takes 10 to 24 hours to train a moderate size datasets. To speed things up I decided to build my own DL rig, here’s the pcpartpicker:

https://pcpartpicker.com/list/N84Kjc

The parts were picked following Slavv’s guide: https://blog.slavv.com/the-1700-great-deep-learning-box-assembly-setup-and-benchmarks-148c5ebe6415

Unlike Slav, I plan on using 1 GPU (GTX 1080 TI) until I can justify the cost of buying and building a multiple GPU system; even then, if I need more than 1 GPU I think I will sell my old rig and buy a new one. Is it a good idea to use the i5-7500 which only has 16 PCIe lanes? Would this be a bottleneck for the 1080 TI and the NVMe m.2 ssd? (Assuming they share bandwidth)

Sorry if this was already answered in this thread. I read this thread for 3 hours but I couldn’t get 2/3 of the way through.

Thanks in advanced.

Dat

I don’t think PCIe bandwidth is worth worrying about if you’re using only 1 or 2 GPUs. You won’t be using the full bandwidth while training, since the GPU cannot do its convolution computations that fast anyway.

Slav did mention (and some citations) that people were experience bandwidth bottlenecks, here is a text from Slav’s blog:

Edit: As a few people have pointed out: “probably the biggest gotcha that is unique to DL/multi-GPU is to pay attention to the PCIe lanes supported by the CPU/motherboard” (by Andrej Karpathy). We want to have each GPU have 16 PCIe lanes so it eats data as fast as possible (16 GB/s for PCIe 3.0). This means that for two cards we need 32 PCIe lanes. However, the CPU I have picked has only 16 lanes. So 2 GPUs would run in 2x8 mode (instead of 2x16). This might be a bottleneck, leading to less than ideal utilization of the graphics cards. Thus a CPU with 40 lines is recommended.
A good solution would be an Intel Xeon processor like the E5–1620 v4 ($300). Or if you want to splurge go for a higher end processor like the desktop i7–6850K ($590).

Following that logic, if you were to run 3 or 4 GPUs, you would not see linear improvements in speed on a single machine since you would be limited by PCIe bandwidth. Even with 40 lanes (Intel Xeon: https://ark.intel.com/products/92991/Intel-Xeon-Processor-E5-1620-v4-10M-Cache-3_50-GHz) it would run at 1x16, 3x8. So ideally, 40 lanes would be good for a 2 GPU system.

For the 16 pcie lane that I’m considering buying, since I’m only going to run 1 1080 TI on it, I am only concern if the 16 lane is shared with the NVMe drive and if that will be a bottleneck. I guess I got to do some more research.

EDIT: I think I found the answer, http://www.tomshardware.com/answers/id-2943191/nvme-ssd-affect-gpu.html
This means my NVMe SSD won’t be a bottleneck for my single GPU for a 16 pcie lane CPU. However, if I plan on using multiple GPUs then I would need a CPU with more than 16 pcie lanes, otherwise it’ll go into the 2x8 (for 2 GPUs) or 1x8 and 2x4 (for 3 cards). Reference -> https://www.pugetsystems.com/labs/articles/Z270-H270-Q270-Q250-B250---What-is-the-Difference-876/

Hi everyone,

Just thought I could share an update in this thread since posting the specs of my DL rig early May 2017.


Roughly: a 2015 i5-4690K (4 cores) Intel CPU + air-cooler, 16gb DDR3 RAM, Samsung SSD 500gb (not NVMe) + 3tb HD, a GTX 1080Ti, all into a 60$ Corsair box with a 650watts PSU.
Dual-boot Win 10 and Ubuntu 16.04, with a SSD swap file of 50gb (to complement the ‘low’ 16gb RAM, a must-have imho).

Since then, I’ve completed Part 1 and most of Part 2 (I focused on sections I found relevant to my business/job prospect, and skipped the rest for later use).

More importantly, about 3 months ago (mid-June 2017), I started doing Kaggle competitions “seriously” because (1) @jeremy highly recommends it for fast/practical learning and (2) it’s a metric used by a growing number of companies/recruiters if you don’t have a PhD from MIT/Stanford.
You can see my Kaggle results here, approx Top 25% on average (ie. better than Top 50% that Jeremy mentions in his lessons), plus a Bronze Medal.
https://www.kaggle.com/ericperbos/competitions

I’d like to make one point very clear, so you don’t waste money on your rig in a useless fashion:

- Having a better rig would NOT have resulted in better Kaggle results.

An AMD Ryzen ThreadRipper 1950X with 16 cores and 32 threads (999$) and a GTX Titan Xp (1200$) would have make little difference, not to mention an even more overpowered rig with multi-gpu (and its pain-in-the-arse setup, check Tim Dettmers blog).
If you can afford it, good on you but…

Once you have a nVidia GPU with 8gb vram, a decent CPU with 4 cores and a SSD to handle the OS/data/swap, you can handle any Kaggle competition to reach Top 25%.

If you can’t, don’t blame the hardware and look in the mirror :sunglasses:

Eric PB

PS: if you think Part 2 is more demanding on DL hardware, you can check the full menu here

3 Likes

Some quick back-of-the-napkin math:

One epoch of training on MobileNet using Keras and TensorFlow takes 2 hours on my machine (one GTX 1080 Ti). That is 1.2 million images in 7200 seconds.

Each image is rescaled (on the CPU) to 224x224 pixels, so that is 224 x 224 x 3 x 1.2M bytes that get transferred to the GPU. Divided by 7200 seconds gives an average of ~24 Megabytes/second.

This means the GPU can only process 24 MB/sec worth of data while training this kind of neural net. My machine isn’t limited by the PCIe bandwidth, but by the amount of computation the GPU can handle.

An x16 PCIe 3.0 connection can transfer ~16 GB/sec (in theory). That means the PCIe bus can move data 600 times faster than the GPU can handle when training this particular neural net.

The maximum host-to-device bandwidth of my 1080 Ti is about 12.5 GB/sec. That is just copying data from the CPU to GPU (or back). If that’s all you’re doing, or you’re just running compute kernels that can approach this kind of bandwidth, yeah then just 16 PCIe lanes won’t be enough if you have more than one GPU.

But once you start training a serious neural network, your GPU will spend more time doing its computations than transferring data.

EDIT: Just for another data point, I looked at some benchmarks. Training VGG16 on the 1080 Ti takes 128.14 ms for a batch of 16 x 3 x 224 x 224 bytes. Per second that translates to throughput of 18 MB. Not even close to what you need to fill up the PCIe bus. :wink:

3 Likes

I was setting up the AWS instance and the very last step (running the script from: http://www.platform.ai/files/setup_p2.sh) did not work. No access granted. Who or what allows me to get access and get the instance up and running?

Thanks for the breakdown. Interesting that you are running a 1080 TI and your GPU is the bottleneck. I already ordered all of the hardware and I will start building this weekend. I’ll report back with some benchmarks once I start making my way through the course.

At least I know I will need to have a completely new system if I plan on using multiple GPUs if I wish to use the system to the maximum potential.

I am trying to get parts for a box that starts off with 1 1080 Ti and add a second one later if it feels helpful

Apparently, the motherboard in my build wont allow that, however

It turns out the MSI Z270-A PRO has 1 x16e PCIe slot, not 2. It can hold an extra GPU in a x4e slot, but performance will be constrained. Do not buy this if you plan on using multiple GPUs!

Anyone know which motherboard I should get, and what other parts should change to allow this flexibility?

PCPartPicker part list: https://pcpartpicker.com/list/JJt8Gf

If you are wanting multiple GPU, you are probably better off going with an intel 6850k. Here is what I just finished building:

https://pcpartpicker.com/list/Lgsz3F

I built it about a week ago and am very happy. I don’t yet have multiple GPUs, but if I ever did get another, it has 2 x16 PCIe slots.

I’m thinking about getting a multiple GPU box and ideally something that might be able to fit into a small form factor case too and be relatively quiet. I’m thinking about two GeForce GTX 1080 TI Minis, and pairing them up with an Intel i7 that has 40 PCIe lanes.

I’m not sure if I can find a case/mini ITX solution that will be able to support the i7. The intel i7 6850k that Tyler recommended looks great. Just need to see if I can find a motherboard and power supply that would work in a SFF PC.

Does anyone have any experience building a dual GPU small form factor DL rig?

I got this motherboard: https://pcpartpicker.com/product/2Phj4D/asus-strix-z270-e-gaming-atx-lga1151-motherboard-strix-z270-e-gaming

You won’t need to change anything to your parts list, except you don’t need the WiFi thing since the Strix 270-E comes with WiFi.

To answer my own question. I found this commercial build of a small form factor deep learning rig here.

I’ve actually decided not to go this direction, but to build a full tower system that I can expand and add more cards to over time. But I thought I’d share this link in case others pursue this direction of system design.

You can even get a neat little carrying case for it!