Making your own server

kzuiderveld · April 2, 2017, 2:07pm

@RogerS49, my .theanorc specifies
root = /usr/local/cuda

/usr/local/cuda is a symlink to /usr/local/cuda-8.0

cudnn.h was copied into /usr/local/cuda-8.0, libcudnn.so is a symlink in /usr/local/cuda/lib64 to libcudnn.so.6.0.20 in the same directory.

I think you should try to use /usr/local/cuda in your .theanorc config, then add symlinks where needed (ln -s source target).

RogerS49 · April 2, 2017, 6:23pm

Thanks for the reply. It looks the same setup. Sadly something weird is going on at that interface with the .theanorc mechanism with in my machine. It just won’t work with /usr/local/cuda, it’s as if the link is not working, although the owner is root it has world permissions of ‘rwx’. Did you install using sudo or just as user.

Even the anaconda install was difficult as I tried to keep to the requirements of the theano v0.9.0 install page
http://www.deeplearning.net/software/theano/install_ubuntu.html

The mad things is that nvidia-dmi and deviceQuery both work which seems to suggest the GPU is set up right.
they work with both cudnn 5.1.3 and 6.0.20 but I am restricted to cudnn v5.1.3 with theano ,it grinds to a halt if I replace it with cudnn 6.0.20.

My attempted at lesson1 first single dogscats epoch about cell 4th on page took 600 secs which is not right.
I have persevered enough with the deb pkg I am going to start a fresh with a new install of ubuntu. Or even centOs

kzuiderveld · April 2, 2017, 7:03pm

@RogerS49,

That suggests that something is still iffy with theano. I didn’t install Theano “by hand”, but instead upgraded the Theano package, the distribution picked up a 0.9.x version that supports cudnn 6. Alas, can’t remember the chant that I used. Good luck!

RogerS49 · April 3, 2017, 8:06pm

@kzuiderveld

Tried to upgrade theano, still at 0.9.0, no luck with conda, would have to install from git me thinks.
System is 90% stable giving 2x … .1x (augmentation) better times per epoch in lesson2 compared to the original times in the downloaded course ‘ipynb’ .
Ubuntu is a bit flaky the results don’t display in full sometimes. That may be down to 16GB memory, swapping .
It’s not like that with my Mac Pro same memory and no GPU had not a single hiccup, memory swapping like crazy on the same disk as well; who wants anything else, right but it was taking 5…8 hrs per 273s epoch. Need a new Mac

So GTX 1080 ti with package
wget “http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/” -O “cuda-repo-ubuntu1604_8.0.61-1_amd64.deb”

That’s cuda 8.0.61-1 with driver 375.39
works with cudnn 5.1.3 and then 0.9.0

cudnn 6.0.20 works with cuda 8.0.61-1 and driver 375.39 but not with theano 0.9.0 which accepts it with warnings but won’t compile. I am note sure git version would improve the situation. I will try to take a closer look at this when I have time.

Also note link in post 222 was followed when setting up python libraries, starting with the Anaconda2-4.3.1-Linux-x86_64.sh shell

Box is Z640 16GB ram 8/16 core T4K60ET#ABU

kzuiderveld · April 3, 2017, 8:32pm

@RogerS49,

I installed the latest Theano using pip, I believe - it’s at 0.9.0 so it should be the same version.

The 375.39 NVIDIA driver does not support the 1080Ti though - you should upgrade to a driver that supports the card (378.13 here).

davecg · April 3, 2017, 11:17pm

You can quick check things with the built in Keras docker container. You’ll need to customize a bit though if you want to run the scripts from the course.

Install Docker
git clone https://github.com/fchollet/keras
cd keras/docker && make notebook

Any docker container you want to create can just start with FROM keras and use everything keras installs + your custom code. (Default user is keras, so you’ll have to switch to root in your dockerfile to apt install additional software.)

RogerS49 · April 4, 2017, 2:54pm

I installed the latest Theano using pip, I believe - it’s at 0.9.0 so it should be the same version.

Same as me

The 375.39 NVIDIA driver does not support the 1080Ti though - you should upgrade to a driver that supports the card (378.13 here).

Had a go but failed. Switched to run level 3. Hoping the run file would backup existing stuff and install on top (The info option says this is what it does.), but each time the pre-install failed. With no indication in the nvidia-installer.log file why. One or more of the items in
http://us.download.nvidia.com/XFree86/Linux-x86/378.13/README/installdriver.html
must be wrong in my environment.

So going to stick for now.

As an aside used the check for latest option from 378.13 run file and it came bak with 375.39. So as I don’t use the 1080 ti for display it’s not an issue at the moment. The other downside is that installing this way the package gets lost with a new kernel ( I am no expert ) So for now

chetan · April 4, 2017, 4:50pm

Finally!, bite the bullet, ordered the parts to build my own deep learning box. Thanks to those lovely blogs and discussions. Hope fully all your experience will help me to set up my box faster.

layla.tadjpour · April 8, 2017, 6:32am

I have created a parts list on pcpartpicker.com before purchasing my parts. but I am getting this compatibility note:

“The motherboard M.2 slot #0 shares bandwidth with a SATA 6.0 Gb/s port. When the M.2 slot is populated, one SATA 6Gb/s port is disabled.”

You can check it here
Can anyone explain what that means and if it is ok if i ignore it? Do I need all SATA ports?
It seems that my using Samsung 250GB 960 EVO NVMe is creating this issue. If I switch to Samsung 250GB 960 EVO, the incompatibility issue is resolved.

stephenl · April 8, 2017, 7:17am

First Post. Thanks to everyone for posting on this forum its been a great help. I will start off telling you what I have done. Enthused by Jeremy’s and Rachel’s online course I went out and had assembled a purpose built PC to run the labs. The iMac didn’t have enough ram to do the job. Here is the spec’s for this ‘gaming’ PC.

Case: NZXT Source 340 Black Full Tower
Motherboard: Intel X99 Chipset ATX Gamer Edition (MSI)
CPU: Intel Haswell i7-5930K Haswell-E 3.7GHz
CPU Cooler: Intel Standard Stock CPU Cooler
RAM: 32GB DDR4 Quad DIMM Black Series
Primary Storage Device: 250G High Performance M.2 SSD
Second Storage Device: 2TB Hard Drive
Graphics Card: Nvidia GTX 1080 TI 11GB
Power Supply: 850W 80 Plus Gold Modular

The best I can get is 258 seconds on Vgg even with a GTX 1080Ti. I am running a Haswell CPU because on the X99a you must. It as 32Gb of RAM using a M.2 SSD. The OS is Ubuntu 16.04 LTS downloaded on the iMAC and installed on a flash drive using ‘rufus’, I managed to upgrade the Nvidia driver to 378.13 running CUDA 8.61-1, and Theano 0.9 after I got it working. There are a few tricks to getting this in place I am happy to share what I did. For ubuntu just follow the install as per the course GPU instructions for AWS as follows: https://github.com/fastai/courses/blob/master/setup/install-gpu.sh . I did this line by line verbatim, it works on this hardware no issues or drama’s. Trying to do it via the CUDA run file is a mission to hell. Get it working as per the AWS script then you know at least it did work. I will say that upgrading the driver and CUDA version on this hardware had ZERO impact at all on performance at least on Vgg. There’s a ‘handbrake’ on someplace but as yet I cannot find it. I tried all sorts of batch sizes its made no effect on both drivers 375 and 278 and CUDA versions. I have run bandwidth tests in the NVIDIA ‘samples’ directory. It ran a 2.9 terabit per second on the memory transfer, the device to host ran 103 Gbit/s. The screen is still running on a low resolution mode but I have it running semi- headless via jupiter and ssh from the iMAC. Thats as far as I got. The only thing weird I found is the lib64 files showing almost duplicate versions 8.0 (e.g… libcublas.so.8.0 and libcublas.so.8.0.61) and all other duplicates ending in either 8.0 and 8.0.61. It maybe time to smoke the CUDA toolkit and cuDNN again and clear it all out and see what it puts in where and what CuDNN places in lib64. Maybe tomorrow.

dradientgescent · April 8, 2017, 8:34am

Your link is mostly empty, so I can’t see the hardware you selected.

But basically, you should be fine. Most motherboards come with 4-6 SATA ports. So disabling one isn’t a big deal and not all motherboards will disable sata ports if you run the NVMe in PCI-E mode (which is what you want as it is much much faster).

You will likely only need 1-3 sata ports, so even with only 4 ports you will be fine. But most boards have at least six these days.

layla.tadjpour · April 8, 2017, 8:38am

I see. thanks. Don’t know why you can not see the parts in the above link but here are the most basic parts:

Intel Core i5-7500 3.4GHz Quad-Core Processor
MSI Z270-A PRO ATX LGA1151 Motherboard
Samsung 960 Evo 250GB M.2-2280 Solid State Drive
Western Digital Caviar Blue 1TB 3.5" 7200RPM Internal Hard Drive
EVGA GeForce GTX 1070 8GB SC GAMING ACX 3.0 Black Edition Video Card

dradientgescent · April 8, 2017, 9:26am

Keep in mind, NVMe drives will use 4 lanes of your PCI Express bus. Newer boards typically have 16 or 20 lanes and each GPU ideally wants 16 lanes, but will be fine with only 8. Once you get to 3 GPUs (or more than one NVMe card) you start getting down to 4 lanes per gpu.

8x is fine for all cards until you get to Titan X and 1080Ti, both of those will saturate an 8x slot, which means to get full performance you can only run one card in a non-Xeon setup. The performance loss isn’t huge, but it is there. I don’t have tests with the 1080Ti but the Titan X did see saturation in an 8x slot (even with PCI Express 3) and the 1080Ti is faster, so the saturation will be worse.

layla.tadjpour · April 8, 2017, 6:41pm

well, I am planning on getting another GPU later so this sounds a bit problematic. I am wondering if getting another kind of motherboard will solve this incompatibility issue with NVMe drives? do you have any suggestion?

Rothrock42 · April 8, 2017, 8:53pm

@layla.tadjpour you link url ends at /list…you need to copy the permalink which will end in something like 7xwLpb.

Most of the Z270 boards allow for 28 PCIe lanes. So you should be fine with two gpus operating at x8 speed.

The x99 boards usually allow for 28 or 40 depending upon what the cpu supports.

Here is a link to the machine I put together. https://pcpartpicker.com/b/j6XPxr

I’m planning to get another gpu (or two) eventually myself. I didn’t understand about the M.2 when I built it, but would probably have gotten the 250 SSD for boot, 1 TB M.2 for working data, and the large spindle drive for long term storage. Live and learn! Good luck.

RogerS49 · April 8, 2017, 9:52pm

@stephenl I think the 8.0 and 8.0.61 maybe links to the same thing

layla.tadjpour · April 8, 2017, 10:18pm

@Rothrock42
Thanks. I fixed the url link. How can I check that this board (MSI Z270-ATX LGA 1151) has 28 PCIe lanes. I could not find anything on the web.

Rothrock42 · April 8, 2017, 10:51pm

@layla.tadjpour I usually follow the link to the New Egg site and scroll down to the specifications section. Sometimes it doesn’t tell you the total number, but instead you’ll see something like this

3 x PCIe 3.0 x16 slots (support x16/x0/x4, x8/x8/x4 modes)

Which tells you what speeds you’ll get if/when you fill the various slots.

For the CPU I search on the model on the ark.intel.com site. The i5-7500 only supports up to 16 lanes according to that site, so that might not be the best choice for your plans.

BTW I’m still not seeing the updated link to your build.

stephenl · April 8, 2017, 11:27pm

Yes I suspect so those lib files seem to be the same.

I just blew out both CUDA8.0 and cuDNN 6.0 and reinstalled, then theano seems to take it with the usual watchout messages but it blows out compile errors when Vgg runs where as when I slide in cuDNN 5.1 all is good again. I even tried the dev version of theano its running that now no change. I tried all the tricks but it keeps throwing compile errors.

The end of the C file it tried to compile has this to say…

Exception: (‘The following error happened while compiling the node’, GpuDnnConv{algo=‘small’, inplace=True}(GpuContiguous.0, GpuContiguous.0, GpuAllocEmpty.0, GpuDnnConvDesc{border_mode=‘valid’, subsample=(1, 1), conv_mode=‘conv’, precision=‘float32’}.0, Constant{1.0}, Constant{0.0}), ‘\n’, ‘nvcc return status’, 2, ‘for cmd’, ‘nvcc -shared -O3 -Xlinker -rpath,/usr/local/cuda-8.0/lib64/ -use_fast_math -arch=sm_61 -m64 -Xcompiler -fno-math-errno,-Wno-unused-label,-Wno-unused-variable,-Wno-write-strings,-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/sl/.theano/compiledir_Linux-4.4–generic-x86_64-with-debian-stretch-sid-x86_64-2.7.12-64/cuda_ndarray -I/home/sl/.theano/compiledir_Linux-4.4–generic-x86_64-with-debian-stretch-sid-x86_64-2.7.12-64/cuda_ndarray -I/usr/local/cuda8.0/include -I/home/sl/Theano/theano/sandbox/cuda -I/usr/local/cuda-8.0/include/ -I/home/sl/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/home/sl/anaconda2/include/python2.7 -I/home/sl/Theano/theano/gof -L/home/sl/.theano/compiledir_Linux-4.4–generic-x86_64-with-debian-stretch-sid-x86_64-2.7.12-64/cuda_ndarray -L/usr/local/cuda-8.0/lib64/ -L/home/sl/anaconda2/lib -o /home/sl/.theano/compiledir_Linux-4.4–generic-x86_64-with-debian-stretch-sid-x86_64-2.7.12-64/tmpWUD0e_/ea4e203b6529466794536f8a1bfa77ae.so mod.cu -lcudart -lcublas -lcuda_ndarray -lcudnn -lpython2.7’, “[GpuDnnConv{algo=‘small’, inplace=True}(<CudaNdarrayType(float32, 4D)>, <CudaNdarrayType(float32, 4D)>, <CudaNdarrayType(float32, 4D)>, <CDataType{cudnnConvolutionDescriptor_t}>, Constant{1.0}, Constant{0.0})]”)

davecg · April 8, 2017, 11:43pm

Does anyone have any recommendations on which version of the 1080 Ti to get and from where?