Making your own server

RogerS49 · April 14, 2017, 6:34am

Some payback
I am sure you know but here’s a bunch of commands to access usb from terminal only

sudo disk -l #usb inserted

From this find the name of the usb, it will be the one that wasn’t there before the usb was inserted

sudo mkdir /media/usb #the ‘usb’ can be whatever you want.

Now mount the usb

sudo mount /dev/'somename’1 /media/usb # where somename was discovered in step 1 and usb step 2

you the have access to usb drive

ls /media/usb

Important next step when you have done

sudo umount /media/usb

stephenl · April 14, 2017, 10:17am

Thanks Roger much appreciated I am sure this will come in handy

harveyslash · April 14, 2017, 1:18pm

Yes I have!
I bought a ubiquiti edge router X to set up WOL.
My college gives me high speed internet (1 gb/s) , so that router is suitable.

After setting the router up , you have to add an arp entry, but after that my wol works flawlessly.

Rothrock42 · April 14, 2017, 3:27pm

@harveyslash That looks just like what I need. Does it also include a firewall? This would be connected directly to my service provider, so I’m thinking that is a feature I need.

harveyslash · April 14, 2017, 3:55pm

It definitely has a firewall. It has a fully functional Debian os and is really hacker friendly. Though note that it’s a wired router(no WiFi)

Rothrock42 · April 14, 2017, 4:53pm

Awesome. It is strange to me that they don’t call that out in any of the materials about it! I already have an airport basestation for the wifi on my laptops, phone, ipad, etc. A wired router was just exactly what I was looking for!

stephenl · April 14, 2017, 10:12pm

Thats what I get too. Strangely it all looks a bit different now on 6.0.20. No cnmem showing anymore even though it is set in theanorc, I am just glad its working it clocking good runs at 192 seconds, but yes this is exactly what I get too from line 4. I have no .bashrc pointers it all and its all working without that in place. This can become an obsession in itself tweaking and tweaking trying to find any improvement.

My plan is to step back do some labs, somebody will pop-up and ‘say look at what I found’. When that happens we will all make a change. But I found myself endlessly tweaking and drifting off subject.

You may have to remind me of your set up to understand the 366 seconds, it maybe all your hardware will do. My hardware was optimised for the GTX 1080ti from the ground up, I didn’t take an exisiting box and modify it. The CPU processor forms the core of what can be done (wiki Kabylake and then Haswell), all the motherboard is built around that CPU capability, which leads into what memory it can handle, to its I/O capabilities and that sets performance limits on how it interacts with the GPU and how many lanes its supports and so on. Much of whats on the web is regurgitated hype sending one into a head-spin of fact and figures often out of context, so you don’t know what to believe causing major confusion. I am out of touch with this stuff now a lot of jargon, I didn’t know the jargon so I did the most logical thing, I took a look at the CPU hardware spec’s which is something I can understand mostly, it soon become obvious what will and will not happen upstream because a mismatch means compromises have to be entertained in someway. Some functions on the motherboard use specialised chips much of would require additional reading but I chose what seemed like a good match for the 1080Ti to work well, and it seems to. No doubt I could tweak the MSI board, but again I would lose focus, on trying to learn the ML stuff. I will come back, because doing dozens of iterations takes ages and soon enough it will be time to revisit the super-tuning of the hardware to drop times etc.

Anyway here is my .theanorc file below. It may help. It may not. All of the theano optimisations seem situation specific some will make things worse or cause errors and they work with different libraries. This below is what I could dig up. As I said ‘cnmem’ is not reporting anymore, it maybe an issue, it could help, but at this point its not throwing errors, its doing 192, I am telling myself just leave it for now. Someone will jump in and say you idiot you forgot to do this and suddenly maybe I can get 5 seconds off, who knows.

[global]
device = cuda0
floatX = float32

[lib]
cnmem =.95

[nvcc]
fastmath = True

[cuda]
root = /usr/local/cuda

yay · April 15, 2017, 12:17am

Hi, I have a Ryzen build and had to use Ubuntu 17.04 with 4.10 kernel to make sure the CPU is properly supported. Looks like the CUDA toolkit isn’t compatible with this kernel version and the 381.09 driver is still in beta: https://devtalk.nvidia.com/default/topic/1002788/unix-graphics-announcements-and-news/-linux-solaris-and-freebsd-driver-381-09-beta-/ Has anybody else tried making CUDA work with this kernel version?

EDIT: hmm, this is an interesting bit “The 4.10 kernel will also become the LTS Rolling Kernel in 16.04 LTS” https://insights.ubuntu.com/2017/04/13/ubuntu-17-04-supports-widest-range-of-container-capabilities/

yay · April 15, 2017, 12:56am

I have ordered an Intel based NIC as well on stability/driver support concerns, but the built-in Realtek one on my MSI B350 Tomahawk seems to work just fine, iperf3 shows a 940Mbit/s connection between my laptop and PC, both wired to a router.

stephenl · April 15, 2017, 3:26am

and here is my line 4 as well its the same.

/home/sl/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 6020 on context None
Mapped name None to device cuda0: Graphics Device (0000:02:00.0)
Using Theano backend.

RogerS49 · April 15, 2017, 10:10pm

The variable cnmem is an attribute of the old interface which is/will be deprecated in the next version of Theano. What we have been playing around with the Libgpuarray and pygpu is all part of this new interface.

In the original lesson1 on github it took 588 seconds to run 1 epoch in cell 7 with a Tesla K80.

I appreciate you have a better understanding of the hardware architecture.

I am short on memory which is about to be fixed (only 16GB). But I don’t think that is the problem as the gpu is not using all it’s power as seen from the nvidia-smi script. Mostly less than 50% memory.
The bandwidthTest result is a PASS . Somewhere there is a setting that replaces cnmem for the new interface I think it is

config.gpuarray.preallocate
Float value

Default: 0 (Preallocation of size 0, only cache the allocation)

Controls the preallocation of memory with the gpuarray backend.

The value represents the start size (either in MB or the fraction of total GPU memory) of the memory pool. If more memory is needed, Theano will try to obtain more, but this can cause memory fragmentation.

A negative value will completely disable the allocation cache. This can have a severe impact on performance and so should not be done outside of debugging.

‘< 0: disabled’
'0 <= N <= 1: use this fraction of the total GPU memory (clipped to .95 for driver memory).
‘> 1: use this number in megabytes (MB) of memory.’

So for 1 I don’t have that set. First thing todo tomorrow. Although it does not say how to set it in theanorc. running

python -c ‘import theano; print(theano.config)’ | less

may help. (cnmem is in [lib] section) what section these new parameters take maybe revealed from that config.

I have 1080ti in the second x16 slot; the 610 I use for video was in the first x16 slot. As the driver 378.13 can’t see that gnu, I may take it out and replace the 1080ti in that slot when I add more memory.

My Processor spec is here

http://ark.intel.com/products/92986/Intel-Xeon-Processor-E5-2620-v4-20M-Cache-2_10-GHz

The memory in the motherboard is ddr4 2400hz registered dimm which is different from the cpu spec
Swap is on a separate disc too the ubuntu os.

I don’t see any problems with cpu performance compared to your cpu. It may just be that config parameter. Have you checked the memory the Gpu reports when running nvidia-smi… When I run next I’ll post my findings for cell 7 lesson1 from a fresh startup. Cheers

stephenl · April 16, 2017, 12:01am

It seems config.gpuarray.preallocate = 0.95 does nothing new or anything at all. Yet when I was back under cudnn 5.1 it [lib] cnmem =0.95 seemed to truely allocate that amount of memory you could see it on nvidia-smi. Now its doesn’t use anymore memory than it needs, the first run is always slower after a jupyter restart as the memory needs to be set by the first run it appears. Once the memory is allocated by a previous run, the times are better and stabilises at a fixed time in my case 192 seconds. So I don’t know whats going on. Theano did not throw an error with the config.gpuarray.preallocate = 0.95 syntax but .theranorc using yet [lib] cnmem = 0.95 runs the same it seems. My conclusion is I did the config.gpuarray.preallocate = 0.95 wrong in the .theanorc file as there is no change from having no argument at all for the cudnn memory allocation. Net gain no change. besides you will probably only see a boost of run 1 after that it settles in at least for these Vgg tests, I can see it going worse when memory allocation needs to change with perhaps other models etc.

RogerS49 · April 16, 2017, 8:13am

Got it define preallocate as

[gpuarray]
preallocate=0.95

see
http://www.deeplearning.net/software/theano/library/config.html?highlight=preallocate#config.config.gpuarray.preallocate

worked for me

/home/dl/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 6020 on context None
Preallocating 10613/11172 Mb (0.950000) on cuda0
Mapped name None to device cuda0: Graphics Device (0000:03:00.0)
Using Theano backend.

My first run was 373 sec and the second 236 sec The only changes where adding fast math and preallocate; not sure about fast math.

If you allocate 95% in one notebook what happens if you start another while the first is running. I guess one notebook at a time.

Hopefully it’s back to the real task and Jeremy’s brilliant lessons

stephenl · April 16, 2017, 9:07am

yep that works well done!. I can confirm not only as shown but by nvidia-smi but unlike you I did not see any change in times at all, that was a very nice boost you got you will be pleased, It may have been fastmaths that helped you unless your syntax is different from mine I did not make a single second up in speed but at least I can cross that off the list.

Cheers

/home/sl/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 6020 on context None
Preallocating 10612/11171 Mb (0.950000) on cuda0
Mapped name None to device cuda0: Graphics Device (0000:02:00.0)
Using Theano backend.

RogerS49 · April 16, 2017, 9:52am

Great I took the fast math out. I think a confirmation test might be lesson 5 which can be completed in total quite quickly The LSTM at the end is 100s and 99s per epoch in the original

https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson5.ipynb

one pass through the whole notebook I got 76 sec per epoch for the LSTM at the end.

Hey Happy Easter

stephenl · April 16, 2017, 11:04pm

Roger thats a good time, I haven’t got there yet working on these labs.

One thing is for someone thats knowledgeable out-there to revisit why these GPU’s are not running at 100% under cuDNN 6.0.20. Under cuDNN 5.1 they were mostly running almost continuously at 100%, you could smell the epoxy Now they are cruising well below 100% most of the time, if they could be run productively at 100% for each iteration the improvements I suspect would be dramatic!

MPJ · April 18, 2017, 8:24pm

Hiya,

wanted to get some feedback on my partlist.
Still in doubt if I shouldn’t tone it down a little bit.

https://pcpartpicker.com/user/mpjansen/saved/FYmpgs

Any ideas on which 1080TI to grab? MSI one is around 50E cheaper in NL, how does it compare against the asus one?
Any suggestions on the slected CPU? I am considering downgrading to i5, but cannot really judge the impact / futureproofness.

Hope you awesome lot have some good suggestions for me!

grts

stephenl · April 18, 2017, 10:27pm

WARNING***
As per install instructions above for cuda 8.0 and cuDNN 6.0.2.0 and the ‘sudo apt autoremove’ instruction

During business as usual -one day- while at the command line you may decide to do a ‘sudo apt-get upgrade’ and get a message about recommending a ‘sudo apt auto remove’, asking to remove what appears to be legions of superfluous CUDA-8.0 and lib components apparently not needed. . DO NOT DO IT! (unless you know precisely what is going on).

WHOOPS! If you accidentally did remove packages like CUDA-8.0 or related packages- or any packages really- you wished you had not via ‘sudo apt autoremove’ please see below.

GET OUT OF JAIL FREE CARD (for ubuntu 16.04)

echo ‘#!/bin/bash’ > restore
echo sudo apt-get install grep Remove /var/log/apt/history.log | tail -1 | sed -e 's|Remove: ||g' -e 's|([^)]*)||g' -e 's|:[^ ]* ||g' -e 's|,||g' >> restore
chmod +x restore
./restore

…Just in case.

RogerS49 · April 20, 2017, 10:04am

What you have to consider is can you update that config at a later date. To be of any real use you’ll need more memory. For example when I tried the dogscats ensemble from start to finish with 16GB I had to increase my swap several times and the indication was > 70GB; now I could never complete that notebook as I still go a failed to allocate memory error. When using free and looking at swap, actual memory numbers, the actual memory only had 100K to work with. You can never have enough of anything. I built mine around a HP workstation with a single xeon processor 8cores 1 real, 1 virtual per core so 16cpu’s This also allows me to add another processor, or a different pair of processors to a maximum of 44 cores, and to a maximum of 256 GB of memory. It has some down sides around the PCI especially spacing of slots and placement of wires etc. The bottom x16 slot is to close to the bottom of the cabinet and wires for motherboard connect pass directly underneath and interfere with the rotation of the GPU fan. You can never have enough of every thing. I use Ubuntu 16.04 The HP box supports flavours of linux. A Z640 is around $1700

I have since rerun the dogscats-ensemble now I have 80GB memory; with regard to the memory involved it used approximately 50GB

Christina · April 22, 2017, 6:20pm

Roger, which motherboard did you use for your Xeon? There are a LOT of surplus Xeon 2670s for sale on eBay now, and I am thinking of picking one (or two) up for a new deep learning box with a GTX-1080ti. I am not sure whether to go with dual CPU or just single.

The old Win7/GTX960 machine with one of the very first i7’s is a bit overwhelmed on some of the Kaggle stuff that involves a lot of data… especially when it is my machine for doing almost everything else in the house as well…

Thanks Christina