So far I’ve watched about 80% of part 1 and I love the videos, but there is one thing that I’d like to address, and that is the overall setup of everything, and especially AWS.
Personally, I don’t really like doing things in the cloud unless I have to, especially since it gets really expensive. For example, the first video mentions how great it is that you can rent a 2x970GTX for $200 monthly, which to me feels horribly overpriced. I mean if you’re planning on learning deep learning, you probably intend to need that GPU for more than a few months, in which case you can just buy the GPUs for 2 months worth of subscription.
I’m not trying to criticize the course, since a lot of people use laptops that can’t be upgraded (especially macs with the Radeon cards), but a lot of people (at least from the ones I know) own a desktop computer with a decent graphics card. For this reason I kinda wanted to get to a simple way to setup everything to run locally, without as much hassle as possible.
Now going into the specifics. Looking over TensorFlow, Keras, Theano, libgpuarray and other sites, it surprises me that they all have instructions for Ubuntu, macOS, Windows, and almost all of them are very specific (installing pip from specific url with tons of system libraries). This lead to incredibly frustration trying to setup everything Arch, Windows 10, Ubuntu 16.04 and OpenSUSE Tumbleweed at the same time (yeah I know my setup is complicated, but I can’t imagine everyone doing data science runs Ubuntu?). But to my surprise Miniconda/Anaconda seems to solve basically all of the issues, yes almost nobody mentions it.
Now this leads me to my problem. I’ve tried installing Tensorflow with GPU support on Python 2.7
conda create -n foo python=2.7 tensorflow-gpu keras
This sets up proper Python version, MKL, CUDA-Toolkit and cudNN in just a single line, without having to download/install CUDA libraries by hand, setting up path, or installing any extra system libraries.
Testing that it works with TensorFlow
$ KERAS_BACKEND=tensorflow python -c "from keras import backend"
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally
but if I try to do the same with Theano
$ KERAS_BACKEND=theano python -c "from keras import backend"
Using Theano backend.
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 164, in <module>
use(config.device)
File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 151, in use
init_dev(device)
File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 60, in init_dev
sched=config.gpuarray.sched)
File "pygpu/gpuarray.pyx", line 614, in pygpu.gpuarray.init (pygpu/gpuarray.c:9415)
File "pygpu/gpuarray.pyx", line 566, in pygpu.gpuarray.pygpu_init (pygpu/gpuarray.c:9106)
File "pygpu/gpuarray.pyx", line 1021, in pygpu.gpuarray.GpuContext.__cinit__ (pygpu/gpuarray.c:13468)
GpuArrayException: Error loading library: -1
I also tried with explicit flags but getting the same output (note that I use device=cuda
while the course suggests device=gpu
, but if I use device=gpu
I get a deprecation warning, so I’d expect that the current version of theano I have installed is newer?)
THEANO_FLAGS="cuda.root=$HOME/.miniconda/envs/foo,device=cuda,floatX=float32" KERAS_BACKEND=theano python -c "from keras import backend"
Now given that tensorflow works, I’d assume the problem is not in the CUDA drivers not being properly setup. I can also see the CUDA libraries under ~/.miniconda/envs/foo/lib
Here’s also the output of nvidia-smi
$ nvidia-smi
Sat Apr 29 20:45:04 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 378.13 Driver Version: 378.13 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 960 Off | 0000:03:00.0 On | N/A |
| 0% 36C P8 12W / 130W | 172MiB / 1993MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 20870 G /usr/lib/xorg-server/Xorg 170MiB |
+-----------------------------------------------------------------------------+
Digging a little bit deeper, it seems the problem is with pygpu itself.
$ DEVICE="cuda0" python -c "import pygpu;pygpu.test()"
pygpu is installed in /home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/pygpu
NumPy version 1.12.1
NumPy relaxed strides checking option: True
NumPy is installed in /home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/numpy
Python version 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
nose version 1.3.7
EEEEEEE
======================================================================
ERROR: Failure: GpuArrayException (Error loading library: 0)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/pygpu/tests/test_basic.py", line 5, in <module>
from .support import (gen_gpuarray, context)
File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/pygpu/tests/support.py", line 32, in <module>
context = gpuarray.init(get_env_dev())
File "pygpu/gpuarray.pyx", line 614, in pygpu.gpuarray.init (pygpu/gpuarray.c:9415)
File "pygpu/gpuarray.pyx", line 566, in pygpu.gpuarray.pygpu_init (pygpu/gpuarray.c:9106)
File "pygpu/gpuarray.pyx", line 1021, in pygpu.gpuarray.GpuContext.__cinit__ (pygpu/gpuarray.c:13468)
GpuArrayException: Error loading library: 0
...
But at this point I really don’t know what to try. Ideally since miniconda already can get Tensorflow working with CUDA/cudNN I’d like to avoid installing system packages, and instead get things working through conda
.
Any tips are appreciated