Making your own server

dradientgescent · April 28, 2017, 3:24pm

Get a z270, the z170 is not a good choice for Kaby Lake. Make sure you get a good PSU, Google “PSU Tier List” and pick the most recent post and try to get something in the top 3 brackets, as high as you can spend, this is not a black box part, it is very critical.

If you can, get 16GB sticks if you go with 32GB, that will allow you room to go to 64GB if you need to but ram prices are climbing so if it makes more sense to get 4x 8GB then do it. I wouldn’t waste time with a liquid cooler, it isn’t really necessary and they are not all that much better unless you go with a full liquid system.

A good fan (Noctua) will be pretty close to the closed liquid cooling systems. You can even go with a cheap Evo cooler for like $25. You don’t need much unless you go extreme overclocking, you can get the same overclock 90% of people use with a $25 cooler (if you get a good one). This is what I use and I overclock my cpu https://www.amazon.com/exec/obidos/ASIN/B005O65JXI/lexesto-20/ref=nosim/

For SSD, I would recommend sticking with Samsung EVO series, it is the best option regardless if you go SATA SSD or NVMe.

For the nVidia card I would recommend the MSI Gaming X edition or EVGA (not ideal but popular). I would avoid the founder cards as they have improper cooling, especially for extended 100% loads that you see with ML.

I would use Linux for ML and not Windows. Dual Booting to windows is fine if that is your plan, but Windows is too much of a problem once you get out of Part 1.

vad · May 1, 2017, 5:36pm

why did you install python3.4? as far as I understood we need python2

chanansh · May 1, 2017, 5:36pm

I am trying to use Azure NC6 machine and the speed is pretty slow (~600+ seconds for lesson 1 notebook training)
Any suggestion?
I am using python 3.6 and Keras 2 as in here.
Using tensorflow as backend with keras.json with:

{
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "tensorflow",
    "image_data_format": "channels_last"
}

update: I moved to NC12, two GPU and still very slow. When I do nvidia-smi it seems like only one GPU it working
Opened a dedicated thread for this problem here

snakers41 · May 6, 2017, 8:08am

Many thanks for everyone posting here their configs!
This is really the best almost one stop place for building your server.

Here is the config that worked for me in Ubuntu 16.04 ( a slight modification of Jeremy’s code):

# This script is designed to work with ubuntu 16.04 LTS
# Key issues
# - open-cv
# - keras version


# ensure system is updated and has basic build tools
sudo apt-get update
sudo apt-get --assume-yes upgrade
sudo apt-get --assume-yes install tmux build-essential gcc g++ make binutils
sudo apt-get --assume-yes install software-properties-common

# download and install GPU drivers
wget "http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.44-1_amd64.deb" -O "cuda-repo-ubuntu1604_8.0.44-1_amd64.deb"

sudo dpkg -i cuda-repo-ubuntu1604_8.0.44-1_amd64.deb
sudo apt-get update
sudo apt-get -y install cuda
sudo modprobe nvidia
nvidia-smi

# install Anaconda for current user
mkdir downloads
cd downloads
wget "https://repo.continuum.io/archive/Anaconda2-4.2.0-Linux-x86_64.sh" -O "Anaconda2-4.2.0-Linux-x86_64.sh"
bash "Anaconda2-4.2.0-Linux-x86_64.sh" -b

echo "export PATH=\"$HOME/anaconda2/bin:\$PATH\"" >> ~/.bashrc
export PATH="$HOME/anaconda2/bin:$PATH"
conda install -y bcolz
conda upgrade -y --all

# install and configure theano
pip install theano
echo "[global]
device = gpu
floatX = float32
[cuda]
root = /usr/local/cuda" > ~/.theanorc

# install and configure keras
pip install keras==1.2.2
mkdir ~/.keras
echo '{
    "image_dim_ordering": "th",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "theano"
}' > ~/.keras/keras.json

# install cudnn libraries
wget "http://platform.ai/files/cudnn.tgz" -O "cudnn.tgz"
tar -zxf cudnn.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/* /usr/local/cuda/include/

# configure jupyter and prompt for password
jupyter notebook --generate-config
jupass=`python -c "from notebook.auth import passwd; print(passwd())"`
echo "c.NotebookApp.password = u'"$jupass"'" >> $HOME/.jupyter/jupyter_notebook_config.py
echo "c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False" >> $HOME/.jupyter/jupyter_notebook_config.py

# clone the fast.ai course repo and prompt to start notebook
cd ~
git clone https://github.com/fastai/courses.git
echo "\"jupyter notebook\" will start Jupyter on port 8888"
echo "If you get an error instead, try restarting your session so your $PATH is updated"

Also an amazing blog post about GPUs for deep learning and its TLDR message

Also if you happen to speak Russian, here is my PC config

J0hnD03 · May 6, 2017, 12:28pm

Hello all! Thanks for the wonderful posts- I’m thrilled to see all the interest and wish I would have found these forums before I built Skynet…

Full water cooling
2x 980ti 6g -> 1500mHz
Xeon X99

After seeing these pascal numbers I might need to make an upgrade!

EricPB · May 7, 2017, 11:50am

Hello everyone !

Just sharing my own GTX 1080Ti results for @jeremy “mnist.ipynb” notebook from Lesson 4 (BatchNorm + Dropout + Data Augmentation) he was running on last-gen GTX TITAN X.

github.com

fastai/courses/blob/master/deeplearning1/nbs/mnist.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using gpu device 2: GeForce GTX TITAN X (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 4007)\n"
     ]
    }
   ],
   "source": [
    "from theano.sandbox import cuda\n",
    "cuda.use('gpu2')"

This file has been truncated. show original

In the last 12_epoch run (cell #85), I get 9-10 sec per epoch vs. his 13-14 sec so a basic 30% speed gain.

My rig is a former gaming PC from mid-2015:

Gigabyte Z97X motherboard with 16Gb DDR3 1866mhz (reused from a 2012 rig ^!^ )
Intel I5-4690K 3.5Ghz, not overclocked
Corsair CX 650M PSU
Samsung SSD Serie 850 EVO -500Gb + WD Purple -3Tb
Asus GTX 1080Ti Founder Edition
Corsair Carbide 100r Silent Edition
Dual-Boot Win10 + Ubuntu 16.04 (for Fast.ai part 1)
Cost of hardware in 2015 + assembly: 1000 euros
GTX 1080Ti: 700 euros

Obviously the RAM is supposed to be sub-par and the CPU is nothing close to an i7-7700K (which shouldn’t be overclocked BTW, dixit Intel now - ouch for the K but it still delivers pretty well imho.

Note: it’s crucial to drop the CUDA backend and switch to Gpuarray if you still get the pink warning when loading the notebook.
That alone cut epoch time by more than half (like 24sec down to 10 sec).

Last thought: I prefer the Founder Edition or so-called Aero version of the GTX 1080Ti because its airflow is expelled out of the case via its own rear panel = less work for my case’s exit fan.

At full load (100% GPU and 70% CPU) on several 200sec * 12 epoch, I read 87°c on the GTX 1080ti and 67°c on i5-4690K on Psensor.

Anyone planning multiple GPU’s in one case may want to look into this: if you have gaming GPU’s with their triple 92mm fans running at full speed inside the box for hours, you’ll need some serious airflow to exit the heat and a single 140mm case fan won’t cut it.

Cheers,

EPB
PS: if I was to rebuild this rig today, I would only spend 100€ extra on a motherboard capable 128gb RAM vs. 32gb max, and a stronger Corsair PSU like CX 850W capable of dual GPUs.

skrish · May 8, 2017, 10:00pm

Hi dave, those numbers are for derived from gaming benchmarks, which is totally different from deep learning. Here, the bandwidth is perhaps the most important thing along with VRAM. PCIE 3.0 x8 itself would bottleneck your 1080, but thats not a big issue. But when you drop down to x4, your bandwidth becomes half of it, which is <4Gbps. Thats 60% reduce in your bandwidth. Assuming you write optimum code (with data loading, augmentation, pre-processing simultaneously with your training), this could potentially reduce your speeds by atleast 40% if not the full 60%.

skrish · May 8, 2017, 10:10pm

Secondly, since you have a intel 7700k (max 16 pcie lanes), you can have only the following configurations: Up to 1x16, 2x8, 1x8+2x4. So in case you get a motherboard which has 3 pcie 3.0 slots, make sure they support x16,x8 and x4 modes (some motherboards support only x16 and x4). And again, its not advisable to run your 1080ti at 8x as I’ve said earlier, it supports only ~8GBps while your 1080Ti can use up to 11. Even more, you’ll also be running your 1070s (8Gbps) at a mere 4GBps (3.94), which is half the capacity.

My advice: If you really need the extra GPUs, get a higher end CPU (40 lanes, 26 lanes etc) and a motherboard which supports those.

skrish · May 9, 2017, 4:14am

the Z170-A PRO, doesnt support SLI right? I know we dont need SLI, but in general by SLI, what’s meant is, 2 PCIE 3.0 slots capable of running at x8 [8GBps bandwidth] (the bare minimum bandwidth for cards like GTX 1080 - 10Gbps). I saw that although it has 2 x16 pcie3.0 slots, you can only run them at either x16 or x4 and since you have a i7 consumer processor, you can have max 16 lanes. So if you put multiple cards, you will have to run them at x8,x8 configuration and this will run at x4,x4 if i’m not wrong…

I didnt find the time to go through your post, so I’m sorry for the repetition if you had already mentioned this in the post itself. Cheers.

EDIT: I just read the mobo part. “I decided to start with one graphics card (single GPU), but I made sure the MSI board had an additional x16e PCIe slot so I could add another card in the future.” You can do that but your bandwidth of both cards will be reduced to 3.94Gbps (x4,x4). I think you should return it and get a Z170A Gaming M5 or some variant.
PS - I saw this mobo was cheaper by 20% and went to see the difference, and this is it.

EDIT2: If i’m correct, maybe you can edit your blog to reflect that information so that other people also come to know?

brendan · May 9, 2017, 4:23am

Yes you’re exactly correct. I realized this only after writing the post. I’m now running 2 Nvidia cards (1080 Ti and 1070) and I will likely upgrade the board soon. Sorry!

RogerS49 · May 11, 2017, 5:59am

This is what I did
If space allows add a new disk; install ubuntu there; then shrink windows partition on original disk and put swap partition there

alexandrecc · May 12, 2017, 4:47pm

Anyone considered a build:
Mobo Asus X99-A II
CPU i7 6800K
GPUs : up to 3 X GTX 1070

This is basically a scaled down, 2017 sweet-spot cost-effective, diy inspired version of the nvidia devbox.

I think for a similar price it would probably get more working Tflops from gpus than a comparable build with Z270 mobo, i7 7700k cpu, gtx 1080ti. Here is an interesting Quora answer from a senior developper at Nvidia :
https://www.quora.com/What-is-better-for-deep-learning-two-1070-or-one-1080ti

nemik · May 13, 2017, 1:09am

Alexandre, That’s pretty much my setup: https://pcpartpicker.com/user/nemik/saved/#view=HwsxrH but I only have 1 1070, not 3.

However I’m not sure my X99 motherboard could fit 3 GTX 1070’s. Two no problem, but 3 might be tough to fit. I got the X99 and 6800K because it’s similar in price to Z270/7700K but for 6 cores and more PCIe lanes which seemed worth it to make it more expandable later.

The 1070 is plenty fast for my purposes but would be nice to have one more GPU for experimenting while another is training long-term. And If I was to buy another one, I’d probably get the 1080 TI so I could do CNN’s with larger batch sizes with the extra VRAM.

Good luck with whatever you buy.

Gius · May 13, 2017, 11:57pm

Tried to follow your brilliant hints and built my own rig on paper piece by piece, just to see it would cost the same €1.8k-2k needed for an equivalent laptop, so went for a Predator 15 with a GTX 1070 8GB from Amazon and their 3 years warranty extension named Protect, for a total 5 years warranty. Undervolting -0.120V the core solves the CPU overheating / throttling reported from other buyers and I’m done for offline personal usage until the end of 2020.

layla.tadjpour · May 15, 2017, 4:33am

I just finished building my DL machine and installed ubuntu 16.04.02.
I have 1080Ti and been reading about how to install nvidia drivers and cuda.
I read here
that CUDA 8.0 comes with a driver version (375.26) that doesn’t support the GTX 1080 Ti. As a result, installing CUDA from apt-get doesn’t work since it installs this driver version.

Is installing CUDA and nvidia drivers for GTX 1080 Ti on Ubuntu different from 1080 or 1070?

eric4435 · May 17, 2017, 2:08pm

Finished my ML build, thanks everyone on this thread for the guidance. I’m a bit dissapointed by the performance, so if you’re about to start a build - take note.

From the start I decided I would either go with a highly-extensible X99 system (with 40 pcie lanes to use up to 4 GPUs) OR build a ‘disposable’ system that wouldn’t hurt to upgrade from in half a year to a year’s time. I decided to go with option 2.

Parts (all sourced from Craigslist / Kijiji):

GeForce Titan X (Maxwell) - $750
A78M-A + AMD A8-7600 - $100
8 gb Ram (Kingston HyperX Fury) - $40
600W Thermaltake power supply - $40
Some Corsair Tower case - $20
Spare 128 gb ssd - free
Total: $950 (CAD) / 700 US

I can run the Lesson 1 cats/dogs first fit in 400 seconds - beats AWS, but I had hoped for <300s.

The constraints I used for determining which parts to buy:

MoBo: At least one PCIe-3.0 slot (x16)
Processor: At least 16 PCIe lanes (rev. 3.0, I found many that are only up to 2.0 compliant)
GPU : Maximum GPU ram, Cuda-compatible

What I learned (take note if you’re about to build):

TFlops on your GPU matter - If I was doing it again I think I would prioritize Pascal architecture over sheer GPU ram available.
I haven’t benchmarked how long preprocessing takes on my AMD build vs my i7 laptop - but I suspect that I undervalued the importance of a fast intel processor.
Mixing and Matching RAM is not simple - you need to make sure all the timings match up and they support the same clock cycles. A setting that works for two different sticks of RAM may be significantly lower-performing than what either of them support.
Ultimately, I’m happy that the mobo/cpu/ram were cheap and it won’t be a huge loss to upgrade the system. I stand by my avoid-the-middle-road approach.

Finally, some resources that were invaluable while getting all the drivers and whatnot installed:

Installing Cuda (note, Cuda is fairly sensitive to which version of gcc and g++ are on your system, you can always check versions by typing gcc --version (you can check the version of virtually any program this way)).
https://askubuntu.com/questions/799184/how-can-i-install-cuda-on-ubuntu-16-04
Cuda post-install actions:
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions
Your PATH will be reset after reboot! make sure to do this:
sudo nano ~/.profile
then add to the bottom of this file:
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
Make sure to use CuDNN 5.1. When using the latest CuDNN, Theano would import without any problems, but I’d get strange, cryptic runtime errors during training. Using CuDNN 5.1 solved this for me.
Good resource for adding more swap space and tuning the parameters available:
https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-16-04

kzuiderveld · May 17, 2017, 3:08pm

CPU performance is important as it is used to control the GPU and, more importantly, feed the GPU. Python 2 doesn’t support multi-threading, but Python 3 does; if you’re using fit_generator in Keras2/Py3, you can specify the use of multiple threads for data augmentation. This gave me a significant performance boost - assuming you have multiple cores in your AMD processor, it might help.

Gius · May 18, 2017, 9:18pm

Getting an issue with cuDNN from my brand new local Win 10 install and this error - “RuntimeError: You enabled cuDNN, but we aren’t able to use it: Can not compile with cuDNN.” Details and my .theanorc file here: https://github.com/Theano/Theano/issues/5348#issuecomment-302396718 , got the advice to upgrade theano to the latest version but not eating into that yet. Any tip? Thanks in advance.

kzuiderveld · May 18, 2017, 10:33pm

I’m not a Windows user, but based on https://github.com/Theano/Theano/issues/5768 I’d guess two likely causes:

You don’t have a C++ compiler installed or
No header files of cudnn in the include path used by the compiler.

shushi2000 · May 18, 2017, 11:22pm

I built a Ubuntu desktop with two GTX 1080 cards and then I realized that if I run two models at the same time - state farm and cervix cancer, for example - both models are running at one GPU and the other GPU is just sitting there. Is this because Theano does not support multiple GPU? Is there a work-around? I think that in TensorFlow I can use ‘CUDA_VISIBLE_DEVICE’ to switch between GPUs. Is it possible to do the same in Theano?

Thank you!