Just sharing my own GTX 1080Ti results for @jeremy “mnist.ipynb” notebook from Lesson 4 (BatchNorm + Dropout + Data Augmentation) he was running on last-gen GTX TITAN X.
In the last 12_epoch run (cell #85), I get 9-10 sec per epoch vs. his 13-14 sec so a basic 30% speed gain.
My rig is a former gaming PC from mid-2015:
Gigabyte Z97X motherboard with 16Gb DDR3 1866mhz (reused from a 2012 rig ^!^ )
Intel I5-4690K 3.5Ghz, not overclocked
Corsair CX 650M PSU
Samsung SSD Serie 850 EVO -500Gb + WD Purple -3Tb
Asus GTX 1080Ti Founder Edition
Corsair Carbide 100r Silent Edition
Dual-Boot Win10 + Ubuntu 16.04 (for Fast.ai part 1)
Cost of hardware in 2015 + assembly: 1000 euros
GTX 1080Ti: 700 euros
Obviously the RAM is supposed to be sub-par and the CPU is nothing close to an i7-7700K (which shouldn’t be overclocked BTW, dixit Intel now - ouch for the K but it still delivers pretty well imho.
Note: it’s crucial to drop the CUDA backend and switch to Gpuarray if you still get the pink warning when loading the notebook.
That alone cut epoch time by more than half (like 24sec down to 10 sec).
Last thought: I prefer the Founder Edition or so-called Aero version of the GTX 1080Ti because its airflow is expelled out of the case via its own rear panel = less work for my case’s exit fan.
At full load (100% GPU and 70% CPU) on several 200sec * 12 epoch, I read 87°c on the GTX 1080ti and 67°c on i5-4690K on Psensor.
Anyone planning multiple GPU’s in one case may want to look into this: if you have gaming GPU’s with their triple 92mm fans running at full speed inside the box for hours, you’ll need some serious airflow to exit the heat and a single 140mm case fan won’t cut it.
PS: if I was to rebuild this rig today, I would only spend 100€ extra on a motherboard capable 128gb RAM vs. 32gb max, and a stronger Corsair PSU like CX 850W capable of dual GPUs.
Hi dave, those numbers are for derived from gaming benchmarks, which is totally different from deep learning. Here, the bandwidth is perhaps the most important thing along with VRAM. PCIE 3.0 x8 itself would bottleneck your 1080, but thats not a big issue. But when you drop down to x4, your bandwidth becomes half of it, which is <4Gbps. Thats 60% reduce in your bandwidth. Assuming you write optimum code (with data loading, augmentation, pre-processing simultaneously with your training), this could potentially reduce your speeds by atleast 40% if not the full 60%.
Secondly, since you have a intel 7700k (max 16 pcie lanes), you can have only the following configurations: Up to 1x16, 2x8, 1x8+2x4. So in case you get a motherboard which has 3 pcie 3.0 slots, make sure they support x16,x8 and x4 modes (some motherboards support only x16 and x4). And again, its not advisable to run your 1080ti at 8x as I’ve said earlier, it supports only ~8GBps while your 1080Ti can use up to 11. Even more, you’ll also be running your 1070s (8Gbps) at a mere 4GBps (3.94), which is half the capacity.
My advice: If you really need the extra GPUs, get a higher end CPU (40 lanes, 26 lanes etc) and a motherboard which supports those.
the Z170-A PRO, doesnt support SLI right? I know we dont need SLI, but in general by SLI, what’s meant is, 2 PCIE 3.0 slots capable of running at x8 [8GBps bandwidth] (the bare minimum bandwidth for cards like GTX 1080 - 10Gbps). I saw that although it has 2 x16 pcie3.0 slots, you can only run them at either x16 or x4 and since you have a i7 consumer processor, you can have max 16 lanes. So if you put multiple cards, you will have to run them at x8,x8 configuration and this will run at x4,x4 if i’m not wrong…
I didnt find the time to go through your post, so I’m sorry for the repetition if you had already mentioned this in the post itself. Cheers.
EDIT: I just read the mobo part. “I decided to start with one graphics card (single GPU), but I made sure the MSI board had an additional x16e PCIe slot so I could add another card in the future.” You can do that but your bandwidth of both cards will be reduced to 3.94Gbps (x4,x4). I think you should return it and get a Z170A Gaming M5 or some variant.
PS - I saw this mobo was cheaper by 20% and went to see the difference, and this is it.
EDIT2: If i’m correct, maybe you can edit your blog to reflect that information so that other people also come to know?
However I’m not sure my X99 motherboard could fit 3 GTX 1070’s. Two no problem, but 3 might be tough to fit. I got the X99 and 6800K because it’s similar in price to Z270/7700K but for 6 cores and more PCIe lanes which seemed worth it to make it more expandable later.
The 1070 is plenty fast for my purposes but would be nice to have one more GPU for experimenting while another is training long-term. And If I was to buy another one, I’d probably get the 1080 TI so I could do CNN’s with larger batch sizes with the extra VRAM.
Tried to follow your brilliant hints and built my own rig on paper piece by piece, just to see it would cost the same €1.8k-2k needed for an equivalent laptop, so went for a Predator 15 with a GTX 1070 8GB from Amazon and their 3 years warranty extension named Protect, for a total 5 years warranty. Undervolting -0.120V the core solves the CPU overheating / throttling reported from other buyers and I’m done for offline personal usage until the end of 2020.
I just finished building my DL machine and installed ubuntu 16.04.02.
I have 1080Ti and been reading about how to install nvidia drivers and cuda.
I read here
that CUDA 8.0 comes with a driver version (375.26) that doesn’t support the GTX 1080 Ti. As a result, installing CUDA from apt-get doesn’t work since it installs this driver version.
Is installing CUDA and nvidia drivers for GTX 1080 Ti on Ubuntu different from 1080 or 1070?
Finished my ML build, thanks everyone on this thread for the guidance. I’m a bit dissapointed by the performance, so if you’re about to start a build - take note.
From the start I decided I would either go with a highly-extensible X99 system (with 40 pcie lanes to use up to 4 GPUs) OR build a ‘disposable’ system that wouldn’t hurt to upgrade from in half a year to a year’s time. I decided to go with option 2.
I can run the Lesson 1 cats/dogs first fit in 400 seconds - beats AWS, but I had hoped for <300s.
The constraints I used for determining which parts to buy:
MoBo: At least one PCIe-3.0 slot (x16)
Processor: At least 16 PCIe lanes (rev. 3.0, I found many that are only up to 2.0 compliant)
GPU : Maximum GPU ram, Cuda-compatible
What I learned (take note if you’re about to build):
TFlops on your GPU matter - If I was doing it again I think I would prioritize Pascal architecture over sheer GPU ram available.
I haven’t benchmarked how long preprocessing takes on my AMD build vs my i7 laptop - but I suspect that I undervalued the importance of a fast intel processor.
Mixing and Matching RAM is not simple - you need to make sure all the timings match up and they support the same clock cycles. A setting that works for two different sticks of RAM may be significantly lower-performing than what either of them support.
Ultimately, I’m happy that the mobo/cpu/ram were cheap and it won’t be a huge loss to upgrade the system. I stand by my avoid-the-middle-road approach.
Finally, some resources that were invaluable while getting all the drivers and whatnot installed:
Your PATH will be reset after reboot! make sure to do this: sudo nano ~/.profile
then add to the bottom of this file: export PATH=/usr/local/cuda-8.0/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
Make sure to use CuDNN 5.1. When using the latest CuDNN, Theano would import without any problems, but I’d get strange, cryptic runtime errors during training. Using CuDNN 5.1 solved this for me.
CPU performance is important as it is used to control the GPU and, more importantly, feed the GPU. Python 2 doesn’t support multi-threading, but Python 3 does; if you’re using fit_generator in Keras2/Py3, you can specify the use of multiple threads for data augmentation. This gave me a significant performance boost - assuming you have multiple cores in your AMD processor, it might help.
Getting an issue with cuDNN from my brand new local Win 10 install and this error - “RuntimeError: You enabled cuDNN, but we aren’t able to use it: Can not compile with cuDNN.” Details and my .theanorc file here: https://github.com/Theano/Theano/issues/5348#issuecomment-302396718 , got the advice to upgrade theano to the latest version but not eating into that yet. Any tip? Thanks in advance.
I built a Ubuntu desktop with two GTX 1080 cards and then I realized that if I run two models at the same time - state farm and cervix cancer, for example - both models are running at one GPU and the other GPU is just sitting there. Is this because Theano does not support multiple GPU? Is there a work-around? I think that in TensorFlow I can use ‘CUDA_VISIBLE_DEVICE’ to switch between GPUs. Is it possible to do the same in Theano?
Thanks for this. After more tweaking without result, I suspect it is my Anaconda Python 3.6 conflicting with Cuda 8 (this officially works with >=3.3 to <3.6) but I do not want to downgrade. I also have the latest pygpu backend installed already and that’s not working as well, cannot be imported.
Have you considered a dual-boot Win10 and Ubuntu 16.04 ?
It’s pretty easy to install and will give you access to a wider range of “stable” machine learning tools.
Plus you’ll have to switch to Ubuntu for Part 2 down the road.
Pro-tip: make sure to keep a Win10 installation DVD or USB for the day you decide to uninstall your Ubuntu partition, as you’ll need it to repair/restore your boot manager
I never got CUDA 8 working with Python 3.6, I totally hear you about not wanting to downgrade (I am also stubborn like this) but in this case it might be easier to just relent and use 3.5 (you can set the latest Anaconda to use 3.5, no need to downgrade the Anaconda distribution)
I also agree that pragmatically, maybe using Linux is easiest right now… but again some of us are stubborn and want to promote the use of the OS that we use daily. If no one puts the time in to smooth out the bumps then it will never get better. It’s also quite the pain to quit all my tasks and leave the rest of my software behind when I reboot into Linux.