Just a note about the 378.13 driver and 375 drivers I finally installed the 378.13 but then I had no visuals. I have two graphics cards, The other was just for video a GT 610. Now when I run queryDevice it canāt see this card. It can see the GTX 1080 ti. Same with nvidia-smi.
Roger - you may have the Nouveau problem? Not sure I heard you may have to deal with the ubuntu driver. I have one card only so my didnāt get snagged around two cards. Nvidia says below what to do thereās more in the install guide. Lets us know how it goes so I can add an addendum.
To install the Display Driver, the Nouveau drivers must first be disabled. Each distribution of Linux has a different method for disabling Nouveau.
The Nouveau drivers are loaded if the following command prints anything:
lsmod | grep nouveau
Create a file at /etc/modprobe.d/blacklist-nouveau.conf with the following contents:
blacklist nouveau
options nouveau modeset=0
Regenerate the kernel initramfs:
$ sudo update-initramfs -u
Read more at: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ixzz4e43h5uDd
Follow us: @GPUComputing on Twitter | NVIDIA on Facebook
Almost certainly it will, based on the lack of detailed advice anywhere to tackle this issue, got to pay it forward!
Thanks to all the detailed articles, step-by-step instructions, and active feedback from this community, Iāve managed to get my own deep learning server set up. Couldnāt have been successful without all the research in this forum thread.
Iām still working on getting code ported from AWS working locally, so youāll be seeing continued posts from me on the forums here, but in the meantime I whipped up a blog post in the hopes that someone out there might find it useful.
http://www.topbots.com/deep-confusion-misadventures-in-building-a-machine-learning-server/
This setup may not appreciate āsuspendā mode on ubuntu, watch for that one, start up the labs again after suspension, run lesson 1 etc. and it starts throwing py GPU errors. Maybe an aberration but if it does it, you probably need to reboot or find the service and restart it. Not sure which service yet but rebooting fixes it.
one to watch.
@topbots Congrats on getting your machine built and running. Read your blog comment about masculine sounding names for gaming components. Search for ATI/AMD Ruby and Nvidia nala/mermaid.
@stephenl Thanks for that.
Not sure thats my problem but Iāll look into it.
I know when I rebooted after installing 378.13 and my video was gone I thought whoops; after ctl-alt-f1 when I ran the samples and smi showed that the 1080ti was there but not the GT 610. You have to appreciate I have two cards and the 1GB is connected to the display. All perhaps I need to do is connect the display to the 1080 ti card.
Please note there is a new driver in BETA released 6th April. The 378.13 is short lived where as the 375.51 is long lived. The new driver that supports 1080 ti is 381.09 maybe this will be the long lived supported version see
http://www.nvidia.com/object/linux-amd64-display-archive.html
for the full archive list or
http://www.nvidia.com/Download/driverResults.aspx/117002/en-us
for the 381.09 driver
Hey everyone,
This is an awesome thread! Thanks to all the contributors.
Iām currently on lesson 3 (Part 1) and Iāve decided to build my own machine as well. I definitely foresee getting a ton of answers from everyoneās previous struggles, so thanks for that, haha. And hopefully, from my experience I can give back as well - answering questions on the thread and planning on writing a blog article (trying to build a āfast/good enoughā machine at $1,000 Canadian).
@stephenl Caution
echo 'alias ju=ājupyter notebook ā-no-browser ā-port=8889ā' > ~/.bashrc
should that have ā>>ā instead of ā>ā as this would create a new file.
similarly with
echo 'alias remote='ssh -N -f -L localhost:8888:localhost:8889 sl@.localdomain' > ~/.bash_profile
I just wanted to recommend that, at this time, I would stick with Intel Kaby Lake CPUās rather than AMD Ryzen. I am still getting weird network driver issues with my Ryzen system (intermittent network dropouts regularly ) after i install the cuda drivers which I cannot diagnose. I suspect its a bios issue on my ASUS 370 Motherboard but if you are building right now, YMMV ad its frustrating. I even replaced the onboard NIC with an intel server card and the same thing happens.
There are no performance or usability issues except for the timeouts. My performance is within 1% of the top speeds I have seen here.
I will post an update if I get this resolved. If someone here works for AMD or ASUS, feel free to reach out.
Leon
thanks Roger - I will try to edit. The greater than and less than symbols are also markup instructions it seems or a bug, I had great trouble with these characters as they disappear on the forum.
Roger - driver 381.09 and I are already acquainted I ran tests on it with cuDNN 6.0.20 and I couldnāt see a difference with 378.13 myself it seemed comparable, 381.09 got the shove when I hit a guest boot cycle issue on a reboot. I had this issue before I did the upgrade to cuDNN 6020 on I believe running driver 381.09 on cuDNN 5.1 at the time. Its a nasty issue where the machines reboots into a guest account, (which is set to false BTW as-in, it was never configured in the first place by me!), you put your password in, it loops back into the guest login again wanting a password again and on it goes ad infintum going back to guest login. It requires a major rip and replace of CUDA and everything from that point in the instruction including the lightdm service to kill off the issue. So 381.09 got the blame, its probably really rebooting with jupyter server running that most likely causes this issue as its linked to the lightdm daemon or service, but its not beyond reasonable doubt driver 381.09 may have played a part, so in order to get on with the course labs, and seeing no real benefit so I stuck with driver 378.13. Thatā's my story on nvidia driver 381.09.
Leon - have you seen better Vgg test times with the server in āheadlessā (text only) mode after your change?
I still get some value at this point out of the GUI -mostly around pulling files from USB and clicking and dragging stuff- its easier to visualise whats in where.
But if theres an actual performance boost in test runs I will put it into headless mode and use the cmd line.
@stephenl I donāt think there is any extra performance. There is just a little more video ram available and having the gui login makes it harder to troubleshoot things on servers.
ok figured as much I know thereās more ram on tap if I need it, so far I havenāt used it all except doing batch size=128, I was just short, only just of that succeeding. I have played with the screen resolution to increase the ram, I was just checking with you in case, the system is spending noticeable time doing screen refrershing and chewing up CPU/GPU cycles. You would think it is consuming CPU and or GPU time or bus time just updating the screen, but if everything is stationary apparently not.
Whats interesting is taking larger batch sizes did not increase performance, youād think taking bigger batches of data would help if you have the ram, but it appears not. I have the ram I can do it, but thereās no benefit.
Has anybody set up their system to use the Wake-on-LAN magic packet? If so, can you share a bit about your home network and what router youāre using?
Appreciate that input not going there yet. I have the 378.13 running with 6020 but my time with lesson1 single epoch (the 7th cell input, just after āThe punchline: state of the art custom model in 7 lines of codeā) was 366 secs.
As with the 375.39 driver the 378.13 is not recognising the name of the GPU. It was good with the 375.51. Not sure if thats an issue.
Here is the output of cell 4 in lesson 1
/home/dl/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 6020 on context None
Mapped name None to device cuda0: Graphics Device (0000:03:00.0)
Using Theano backend.
Is that as expected.
My issue maybe with the slot position of card. Many other have built systems based on the motherboard and whether which lanes are available for which slot. Unfortunately being HP hardware I donāt have knowledge of whats best support for 1081 ti.
Or maybe some parameter or option needs to be set in itās configuration.
Any way here is my take on the solution to getting the 6020 recognised.
In ~/.bashrc I had exported the CPLUS_INCLUDE_PATH, LIBRARY_PATH and LD_LIBRARY_PATH with cuda-8.0 but had neglected to add /usr/local/lib and /usr/local/include/gpuarray where gpuarray was installed when I did this the result is as above.
Any comment
Some payback
I am sure you know but hereās a bunch of commands to access usb from terminal only
sudo disk -l #usb inserted
From this find the name of the usb, it will be the one that wasnāt there before the usb was inserted
sudo mkdir /media/usb #the āusbā can be whatever you want.
Now mount the usb
sudo mount /dev/'somenameā1 /media/usb # where somename was discovered in step 1 and usb step 2
you the have access to usb drive
ls /media/usb
Important next step when you have done
sudo umount /media/usb
Thanks Roger much appreciated I am sure this will come in handy
Yes I have!
I bought a ubiquiti edge router X to set up WOL.
My college gives me high speed internet (1 gb/s) , so that router is suitable.
After setting the router up , you have to add an arp entry, but after that my wol works flawlessly.