Making your own server


(layla.tadjpour) #243

It shows for this board:
PCI Express 3.0 x16: 2 x PCIe 3.0 x16 slots (support x16/x4 mode)
PCI Express x1 : 4 x PCIe 3.0 x1 slots

Does that mean I have 32 PCIe lanes?

I don’t know how to fix the url link. The Permalink is this: https://pcpartpicker.com/list/9GrHKZ


(Rothrock) #244

@layla.tadjpour My bad. I was only scrolling back up to where somebody quoted you. The quote didn’t get updated. Got it now.

That isn’t how I interpret those numbers. That board has two PCIe 3.0 slots. If you have one card put it in slot 1 and it will use the full x16 speed. If you have a second card you put it in slot 2 and it will use x4 times speed. I think it is dradientgescent who has done some testing on the speeds and found that x8 speed cuts off, maybe a couple of percent in performance, but I don’t remember what he said about x4.

Hovever the CPU you’ve selected only supports 16 lanes in these configurations Up to 1x16, 2x8, 1x8+2x4 according to this https://ark.intel.com/products/97123/Intel-Core-i5-7500-Processor-6M-Cache-up-to-3_80-GHz

For one card this set up is probably fine, but if you really plan to get another card – or just want to keep your options open – I would suggest a different combination of mobo and cpu.


(layla.tadjpour) #245

i see. About the CPU, I checked many other CPUs in this price range, i5 and i7 cores on ark.intel.com site and all have 16 lanes maximum. What CPU core do you suggest? The one you got for yourself?


(Leon Letto) #246

My First post :slight_smile:
I have done the first couple of lessons using AWS and got so frustrated I decided to build a box.
I decided to build using a Ryzen 1700 so at least we would know if it works. Here are the parts:
Ryzen 1700 CPU
EVGA 1070 GPU
ASUS PRIME 370-Pro Motherboard (3xPCI Express x16 slots)
32GB DDR4 now going to 64 next week.
25GB SSD boot drive for / and /var
500GB PCIe SSD for /Home
2x1.5TB Hard Drive mirror (not enabled yet but installed)
Rosewill 850w Modular PS
Running Ubuntu 16.04.

So far no errors and I have the first fit time sitting at 310 seconds using
device = cuda
cnmem = 0.95

cuDNN is version 5103

My GPU % is sitting at 100% for the whole time which is good since it was bouncing from 90-100% before I upgraded libgpuarray.
My CPU is at 8% (Python using 177% mostly via 8 threads )

Not sure why I am not down in the high 200s but I suspect its because I am using a conservative GPU. I plan on buying a 1080Ti soon but I am waiting to see what people here have good results with first :slight_smile:

Anyway, all of the feedback here has been awesome and helped me a lot. I am quite happy and can now continue the course knowing I am able to do everything I need locally.

Thank you all for contributing to this great knowledgebase.


#247

@stepheni I had similar problems. It was suggested to install with run type of distributions (@kzuiderveld) which I would have tried had I known how and which packages to remove to make sure every thing is removed. The alternative is to reinstall ubuntu. Anyway I settled with the 8.0.61-1 deb package and cudnn 5.1.3.

cudnn 6.0.20 worked fine with 8.0.61-1 and the nvidia sample apps and nvidia-smi so that part is good it was theano where the problem lies we updated theano to 0.9.0 which worked for @kzuiderveld but not for me. The difference was the 378.13 nvidia driver I was using 375.39. see post 219… 228 ish.

I thought it better to get something done with the setup that works for me and see what else breaks and wait for the next version of cuda cud.


#248

@leonletto cnmem is a parameter of the old cudnn . If you are using libgpuarray which is the new version of cudnn I believe cnmem has no effect. I’ll check on my system and edit this post.


(Stephen) #249

Roger - Well I have 378.13 installed and running it has been. 6.0.20 runs fine again on samples its, theano playing up. I did a fresh install yesterday. You can remove CUDA and cuDNN and slide both versions of cuDNN in and out no issues except 6.0.20 isn’t playing ball with theano during compile. I will look at what @kzulderveld has done to see if its anything obvious I missed. Its a pain as I can’t unleash the money I spent on this toy yet…cheers


#250

@stepheni just to note I have the latest dev version of theano and all versions supported of the other python libraries as per

http://deeplearning.net/software/theano/install_ubuntu.html

and I had the same issues as yourself. I need to break it again (switch out cudnn 5.1.3 replace with 6.0.20) to get the compiler error.
However in general the epoch times I get are better than those displayed in the freshly downloaded jupyter notebooks. At the moment I am working through lesson 6 which had 5 seconds original and mine are 4. the problem is is that we have no idea what the original was run against.


(Stephen) #251

Roger - quick update thus far. Ripped out 378.13 replaced with 381 no change in performance under 5.1.

6.0.20 is still playing up. ripped out all versions of theano the dev, the pip and the condo.

condo list | grep -i theano

No change installing each one and ripping out 1 by 1. BTW none ever took device=cuda it always crapped itself whenever I tried which makes me suspicious that the compile errors maybe gcc or g++ version related. I may rip and replace the whole OS. I have done this so many times I can have this loaded in no time. I just need a bigger 3.0 usb to hold all the files so I don’t need to download again that kills me. More later…


#252

@stepheni

I used the device=cuda0 I have two GPU’s one a small 1GB for display is slot 1 PCI 3 x16 and my cuda cpu in slot 2 PCI x16. The main thing is to run some of the NVIDIA samples when I did I get a PASS for bandwidth and deviceQuery. These and the nvidia-smi report each device names differently swapping 0 for 1 and 1 for 0. Anyway that’s news about a new driver. I have the run files on my box and you can use them to get information so when I ran the 378.13 run file with --latest it came back 375.39 still.

The other thing I changed from the install_setup script was the version of anaconda may be there are linking problems there. I went from 4.2.0 to 4.3.1. but then I down graded stuff to suit the link in my previous post


#253

@stepheni

Did you install pygpu with theano you then get the libgpuarray as a dependent with is required for the new version of the theano back end, it allows the recognition of device=cuda where as the deprecated back end uses device=gpu. The new back end requires the afore mentioned but also the theano dev version.

EDIT: Looking again at the 0.9 dev documentation theano 0.9.0 was released on the 20-3-2017 so it gives the impression there is no other newer release available, be it dev or otherwise. And the specified cudnn is 5.1.3. It may be the case that tensor flow requires cudnn 6.0.20


#254

I plan to get just one geforce 1080 to start. But I thought I may want a motherboard that can hold up to 4 gpus, in case I want to expand that way in the future, when the hardware prices drop. I have the kaby lake I7-7700k cpu. Do you think the asrock z270 supercarrier mobo (with 4 gpu slots) can work? I’m not sure how much performance penalty there would be given that it’s one PCIe x16 for one gpu, but would drop to four x4’s for four gpus.


(Leon Letto) #255

@RogerS49 I thought so since it didn’t show those cnmem status messages anymore after I upgraded.

I also tried some tweaks today and it looks like anything I do the CPU makes no difference. Exactly the same speed. I also upgraded the Ubuntu kernel and that made no difference.

EDIT: Upgrading to latest CUDA. Seems mine was a bit older.
current: cuda-repo-ubuntu1604_8.0.44-1_amd64.deb
new: cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb

Is there a way to overclock your video card from the ubuntu command line?

I’ll live with it if there is nothing else to do :slight_smile:


(Leon Letto) #256

Upgraded Cuda to cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-deb and that worked with just the .deb.
Upgraded cuDNN to version 6020 from here https://developer.nvidia.com/rdp/cudnn-download
I tried the .deb version but that did not work so I downloaded the tar.gz version, extracted it and copied the files over top of the older ones (after backing up the .h file). I Used the instructions here https://stackoverflow.com/questions/38137828/how-do-i-update-cudnn-to-a-newer-version.
sudo cp /usr/local/cuda-8.0/include/cudnn.h /usr/local/cuda-8.0/include/cudnn.h.bak sudo cp include/cudnn.h /usr/local/cuda-8.0/include sudo cp lib64/* /usr/local/cuda-8.0/targets/x86_64-linux/lib

Rebooted, tried again and my times dropped from 310 seconds to 217 seconds :smile:

Also, when I was running the older version of the cuda drivers, my Ryzen system was acting weird and halting for 30-60 seconds for no apparent reason. That is still the case so I may be in for a re-install from scratch anyway. Not today though…


(Stephen) #257

Roger I did all that. I used [conda install theano gpupy] (or similar not at PC).

I am just looking at leon letto’s mod’s below and trying to understand if that will have an impact. I have been using cuDNN tar from the begiining and unpacking it and coping over lib64 and include but not individual files. One thing I have noticed as the spooling up of Vgg with the latest 381 drivers and 6.0.20 is real quick but it can’t compile so there’s evidence its fast, but I can’t use it as yet. Will try today see what happens. I still may have to blow out the whole os and start again.


#258

@leonletto
Great that’s a new version of the deb file I have cuda-repo-ubuntu1604_8.0.61-1_amd64.deb installed which has driver 375.39 so next will install that version an hopefully my 6.0.20 cudnn woes will go away.
So have updated the cuda deb package and now my driver is version 375.51. First notable change is I now have the 1080 ti name recognised correctly before it was called ‘Graphics Device’

@stepheni

I have been using the tar versions

the command I found useful was

sudo cp -dR lib64/* /usr/local/cuda-X.0/lib64/

I removed the versions I was replacing first

sudo rm -i /usr/local/cuda-8.0/lib64/libcudnn*

But I still feel that because the 6.0.20 worked within the nvidia environment it’s not supported by Theano I will stick with 5.1.3 for now.

post any stack traces you may have as text so it’s possible to isolate the includes / libs thanks

Being new to ubuntu I issued the

sudo apt-get update
sudo apt-get --assume-yes upgrade

commands after installing the new deb then rebooted. Not sure if thats good form or not!!!


(Stephen) #259

Leon - I would love to know if there was anything else you did with this install upgrade of CUDA 6.0.61 and cuDNN 6.0.20 as I have tried every thing to get cuDNN 6.0.20 working on CUDA 8.0.61-1, it still won’t run Vgg, version cuDNN 5.1 runs no issues at all using 8.0.61-1. cuDNN appears to load ok but spills out C files when run on Vgg and has some kind of compile error. I rebuilt my machine from scratch used CUDA 8.0.61 run files, deb versions of CUDA 6.0.61.1, drivers 375, 378.13, 381.09 used developer versions of theano, conda versions of theano, pip versions of theano 9.0.9. I just can’t believe based of what I have done yours works on ubuntu LTS 16.04 and mine does not when it appears we have done the same thing, there must be something different some pointer someplace or link missing. I wouldn’t mind look at copies of your .theanorc file, your .bashrc export links and anything else you may have done or can think of. I am at a dead end I can’t open up my 1080ti with cuDNN on 5.1 its runs 258 at best on Vgg. I have ripped out CUDA and cuDNN maybe a dozen times on every combination and still 6.0.20 won’t run Vgg and yet some claim it does work on the forum. If you can think of anything it would be of great help. Cheers -Stephen


#260

@stepheni
Please post your recent compiler error as text


(Christopher) #261

Only 2 GPUs, then a z270 will be perfect, can run 1 NVMe drive, 2 GPUs at full speed (if they are Titan X or 1080Ti you will have a small bottleneck on the bus).


(Leon Letto) #262

@stephenl The only thing you didn’t mention was the libgpuaray. I followed the instructions here http://deeplearning.net/software/theano/install_ubuntu.html when I installed and that lead me here to compile it and install it. http://deeplearning.net/software/libgpuarray/installation.html

During my troubles I also updated the Ubuntu Kernel to 4.10.9 using these instructions https://www.servethehome.com/amd-ryzen-with-ubuntu-here-is-what-you-have-to-do-to-fix-constant-crashes/ and changing it to the latest versions. Here is my update script.
#!/bin/bash cd /tmp wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.9/linux-headers-4.10.9-041009_4.10.9-041009.201704080516_all.deb wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.9/linux-headers-4.10.9-041009-generic_4.10.9-041009.201704080516_amd64.deb wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.9/linux-image-4.10.9-041009-generic_4.10.9-041009.201704080516_amd64.deb echo Everything is downloaded. Time to install. sudo dpkg -i linux-headers-4.10*.deb linux-image-4.10*.deb echo Type sudo reboot to restart your system with the new kernel.

I hope some of this helps.

Leon