Making your own server

Yes on my desktop , port forwarded to 8889 on my laptop

Why are you doing port forwarding? I’d expect http://ipaddress:8888 to work on your laptop (don’t use localhost).

Thank you @kzuiderveld, @Surya501 and @onebitbrain for your feedback. I may try this machine and test it out.

Get a z270, the z170 is not a good choice for Kaby Lake. Make sure you get a good PSU, Google “PSU Tier List” and pick the most recent post and try to get something in the top 3 brackets, as high as you can spend, this is not a black box part, it is very critical.

If you can, get 16GB sticks if you go with 32GB, that will allow you room to go to 64GB if you need to but ram prices are climbing so if it makes more sense to get 4x 8GB then do it. I wouldn’t waste time with a liquid cooler, it isn’t really necessary and they are not all that much better unless you go with a full liquid system.

A good fan (Noctua) will be pretty close to the closed liquid cooling systems. You can even go with a cheap Evo cooler for like $25. You don’t need much unless you go extreme overclocking, you can get the same overclock 90% of people use with a $25 cooler (if you get a good one). This is what I use and I overclock my cpu

For SSD, I would recommend sticking with Samsung EVO series, it is the best option regardless if you go SATA SSD or NVMe.

For the nVidia card I would recommend the MSI Gaming X edition or EVGA (not ideal but popular). I would avoid the founder cards as they have improper cooling, especially for extended 100% loads that you see with ML.

I would use Linux for ML and not Windows. Dual Booting to windows is fine if that is your plan, but Windows is too much of a problem once you get out of Part 1.

1 Like

why did you install python3.4? as far as I understood we need python2

I am trying to use Azure NC6 machine and the speed is pretty slow (~600+ seconds for lesson 1 notebook training)
Any suggestion?
I am using python 3.6 and Keras 2 as in here.
Using tensorflow as backend with keras.json with:

    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "tensorflow",
    "image_data_format": "channels_last"

update: I moved to NC12, two GPU and still very slow. When I do nvidia-smi it seems like only one GPU it working
Opened a dedicated thread for this problem here

Many thanks for everyone posting here their configs!
This is really the best almost one stop place for building your server.

Here is the config that worked for me in Ubuntu 16.04 ( a slight modification of Jeremy’s code):

# This script is designed to work with ubuntu 16.04 LTS
# Key issues
# - open-cv
# - keras version

# ensure system is updated and has basic build tools
sudo apt-get update
sudo apt-get --assume-yes upgrade
sudo apt-get --assume-yes install tmux build-essential gcc g++ make binutils
sudo apt-get --assume-yes install software-properties-common

# download and install GPU drivers
wget "" -O "cuda-repo-ubuntu1604_8.0.44-1_amd64.deb"

sudo dpkg -i cuda-repo-ubuntu1604_8.0.44-1_amd64.deb
sudo apt-get update
sudo apt-get -y install cuda
sudo modprobe nvidia

# install Anaconda for current user
mkdir downloads
cd downloads
wget "" -O ""
bash "" -b

echo "export PATH=\"$HOME/anaconda2/bin:\$PATH\"" >> ~/.bashrc
export PATH="$HOME/anaconda2/bin:$PATH"
conda install -y bcolz
conda upgrade -y --all

# install and configure theano
pip install theano
echo "[global]
device = gpu
floatX = float32
root = /usr/local/cuda" > ~/.theanorc

# install and configure keras
pip install keras==1.2.2
mkdir ~/.keras
echo '{
    "image_dim_ordering": "th",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "theano"
}' > ~/.keras/keras.json

# install cudnn libraries
wget "" -O "cudnn.tgz"
tar -zxf cudnn.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/* /usr/local/cuda/include/

# configure jupyter and prompt for password
jupyter notebook --generate-config
jupass=`python -c "from notebook.auth import passwd; print(passwd())"`
echo "c.NotebookApp.password = u'"$jupass"'" >> $HOME/.jupyter/
echo "c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False" >> $HOME/.jupyter/

# clone the course repo and prompt to start notebook
cd ~
git clone
echo "\"jupyter notebook\" will start Jupyter on port 8888"
echo "If you get an error instead, try restarting your session so your $PATH is updated"

Also an amazing blog post about GPUs for deep learning and its TLDR message

Also if you happen to speak Russian, here is my PC config

Hello all! Thanks for the wonderful posts- I’m thrilled to see all the interest and wish I would have found these forums before I built Skynet…

Full water cooling
2x 980ti 6g -> 1500mHz
Xeon X99

After seeing these pascal numbers I might need to make an upgrade!

1 Like

Hello everyone !

Just sharing my own GTX 1080Ti results for @jeremy “mnist.ipynb” notebook from Lesson 4 (BatchNorm + Dropout + Data Augmentation) he was running on last-gen GTX TITAN X.

In the last 12_epoch run (cell #85), I get 9-10 sec per epoch vs. his 13-14 sec so a basic 30% speed gain.

My rig is a former gaming PC from mid-2015:

  • Gigabyte Z97X motherboard with 16Gb DDR3 1866mhz (reused from a 2012 rig ^!^ )
  • Intel I5-4690K 3.5Ghz, not overclocked
  • Corsair CX 650M PSU
  • Samsung SSD Serie 850 EVO -500Gb + WD Purple -3Tb
  • Asus GTX 1080Ti Founder Edition
  • Corsair Carbide 100r Silent Edition
  • Dual-Boot Win10 + Ubuntu 16.04 (for part 1)
    Cost of hardware in 2015 + assembly: 1000 euros
    GTX 1080Ti: 700 euros

Obviously the RAM is supposed to be sub-par and the CPU is nothing close to an i7-7700K (which shouldn’t be overclocked BTW, dixit Intel now - ouch for the K :wink: but it still delivers pretty well imho.

Note: it’s crucial to drop the CUDA backend and switch to Gpuarray if you still get the pink warning when loading the notebook.
That alone cut epoch time by more than half (like 24sec down to 10 sec).

Last thought: I prefer the Founder Edition or so-called Aero version of the GTX 1080Ti because its airflow is expelled out of the case via its own rear panel = less work for my case’s exit fan.

At full load (100% GPU and 70% CPU) on several 200sec * 12 epoch, I read 87°c on the GTX 1080ti and 67°c on i5-4690K on Psensor.

Anyone planning multiple GPU’s in one case may want to look into this: if you have gaming GPU’s with their triple 92mm fans running at full speed inside the box for hours, you’ll need some serious airflow to exit the heat and a single 140mm case fan won’t cut it.


PS: if I was to rebuild this rig today, I would only spend 100€ extra on a motherboard capable 128gb RAM vs. 32gb max, and a stronger Corsair PSU like CX 850W capable of dual GPUs.

1 Like

Hi dave, those numbers are for derived from gaming benchmarks, which is totally different from deep learning. Here, the bandwidth is perhaps the most important thing along with VRAM. PCIE 3.0 x8 itself would bottleneck your 1080, but thats not a big issue. But when you drop down to x4, your bandwidth becomes half of it, which is <4Gbps. Thats 60% reduce in your bandwidth. Assuming you write optimum code (with data loading, augmentation, pre-processing simultaneously with your training), this could potentially reduce your speeds by atleast 40% if not the full 60%.

Secondly, since you have a intel 7700k (max 16 pcie lanes), you can have only the following configurations: Up to 1x16, 2x8, 1x8+2x4. So in case you get a motherboard which has 3 pcie 3.0 slots, make sure they support x16,x8 and x4 modes (some motherboards support only x16 and x4). And again, its not advisable to run your 1080ti at 8x as I’ve said earlier, it supports only ~8GBps while your 1080Ti can use up to 11. Even more, you’ll also be running your 1070s (8Gbps) at a mere 4GBps (3.94), which is half the capacity.

My advice: If you really need the extra GPUs, get a higher end CPU (40 lanes, 26 lanes etc) and a motherboard which supports those.

1 Like

the Z170-A PRO, doesnt support SLI right? I know we dont need SLI, but in general by SLI, what’s meant is, 2 PCIE 3.0 slots capable of running at x8 [8GBps bandwidth] (the bare minimum bandwidth for cards like GTX 1080 - 10Gbps). I saw that although it has 2 x16 pcie3.0 slots, you can only run them at either x16 or x4 and since you have a i7 consumer processor, you can have max 16 lanes. So if you put multiple cards, you will have to run them at x8,x8 configuration and this will run at x4,x4 if i’m not wrong…

I didnt find the time to go through your post, so I’m sorry for the repetition if you had already mentioned this in the post itself. Cheers.

EDIT: I just read the mobo part. “I decided to start with one graphics card (single GPU), but I made sure the MSI board had an additional x16e PCIe slot so I could add another card in the future.” You can do that but your bandwidth of both cards will be reduced to 3.94Gbps (x4,x4). I think you should return it and get a Z170A Gaming M5 or some variant.
PS - I saw this mobo was cheaper by 20% and went to see the difference, and this is it.

EDIT2: If i’m correct, maybe you can edit your blog to reflect that information so that other people also come to know?

1 Like

Yes you’re exactly correct. I realized this only after writing the post. I’m now running 2 Nvidia cards (1080 Ti and 1070) and I will likely upgrade the board soon. Sorry!

1 Like

This is what I did
If space allows add a new disk; install ubuntu there; then shrink windows partition on original disk and put swap partition there

Anyone considered a build:
Mobo Asus X99-A II
CPU i7 6800K
GPUs : up to 3 X GTX 1070

This is basically a scaled down, 2017 sweet-spot cost-effective, diy inspired version of the nvidia devbox.

I think for a similar price it would probably get more working Tflops from gpus than a comparable build with Z270 mobo, i7 7700k cpu, gtx 1080ti. Here is an interesting Quora answer from a senior developper at Nvidia :


Alexandre, That’s pretty much my setup: but I only have 1 1070, not 3.

However I’m not sure my X99 motherboard could fit 3 GTX 1070’s. Two no problem, but 3 might be tough to fit. I got the X99 and 6800K because it’s similar in price to Z270/7700K but for 6 cores and more PCIe lanes which seemed worth it to make it more expandable later.

The 1070 is plenty fast for my purposes but would be nice to have one more GPU for experimenting while another is training long-term. And If I was to buy another one, I’d probably get the 1080 TI so I could do CNN’s with larger batch sizes with the extra VRAM.

Good luck with whatever you buy.

Tried to follow your brilliant hints and built my own rig on paper piece by piece, just to see it would cost the same €1.8k-2k needed for an equivalent laptop, so went for a Predator 15 with a GTX 1070 8GB from Amazon and their 3 years warranty extension named Protect, for a total 5 years warranty. Undervolting -0.120V the core solves the CPU overheating / throttling reported from other buyers and I’m done for offline personal usage until the end of 2020.

I just finished building my DL machine and installed ubuntu 16.04.02.
I have 1080Ti and been reading about how to install nvidia drivers and cuda.
I read here
that CUDA 8.0 comes with a driver version (375.26) that doesn’t support the GTX 1080 Ti. As a result, installing CUDA from apt-get doesn’t work since it installs this driver version.

Is installing CUDA and nvidia drivers for GTX 1080 Ti on Ubuntu different from 1080 or 1070?

1 Like

Finished my ML build, thanks everyone on this thread for the guidance. I’m a bit dissapointed by the performance, so if you’re about to start a build - take note.

From the start I decided I would either go with a highly-extensible X99 system (with 40 pcie lanes to use up to 4 GPUs) OR build a ‘disposable’ system that wouldn’t hurt to upgrade from in half a year to a year’s time. I decided to go with option 2.

Parts (all sourced from Craigslist / Kijiji):

  • GeForce Titan X (Maxwell) - $750
  • A78M-A + AMD A8-7600 - $100
  • 8 gb Ram (Kingston HyperX Fury) - $40
  • 600W Thermaltake power supply - $40
  • Some Corsair Tower case - $20
  • Spare 128 gb ssd - free
    Total: $950 (CAD) / 700 US

I can run the Lesson 1 cats/dogs first fit in 400 seconds - beats AWS, but I had hoped for <300s.

The constraints I used for determining which parts to buy:

  • MoBo: At least one PCIe-3.0 slot (x16)
  • Processor: At least 16 PCIe lanes (rev. 3.0, I found many that are only up to 2.0 compliant)
  • GPU : Maximum GPU ram, Cuda-compatible

What I learned (take note if you’re about to build):

  • TFlops on your GPU matter - If I was doing it again I think I would prioritize Pascal architecture over sheer GPU ram available.
  • I haven’t benchmarked how long preprocessing takes on my AMD build vs my i7 laptop - but I suspect that I undervalued the importance of a fast intel processor.
  • Mixing and Matching RAM is not simple - you need to make sure all the timings match up and they support the same clock cycles. A setting that works for two different sticks of RAM may be significantly lower-performing than what either of them support.
  • Ultimately, I’m happy that the mobo/cpu/ram were cheap and it won’t be a huge loss to upgrade the system. I stand by my avoid-the-middle-road approach.

Finally, some resources that were invaluable while getting all the drivers and whatnot installed:


CPU performance is important as it is used to control the GPU and, more importantly, feed the GPU. Python 2 doesn’t support multi-threading, but Python 3 does; if you’re using fit_generator in Keras2/Py3, you can specify the use of multiple threads for data augmentation. This gave me a significant performance boost - assuming you have multiple cores in your AMD processor, it might help.

1 Like