Issues setting up keras & theano with miniconda both on Python 2.7 and 3.5

darthdeus · April 29, 2017, 7:03pm

So far I’ve watched about 80% of part 1 and I love the videos, but there is one thing that I’d like to address, and that is the overall setup of everything, and especially AWS.

Personally, I don’t really like doing things in the cloud unless I have to, especially since it gets really expensive. For example, the first video mentions how great it is that you can rent a 2x970GTX for $200 monthly, which to me feels horribly overpriced. I mean if you’re planning on learning deep learning, you probably intend to need that GPU for more than a few months, in which case you can just buy the GPUs for 2 months worth of subscription.

I’m not trying to criticize the course, since a lot of people use laptops that can’t be upgraded (especially macs with the Radeon cards), but a lot of people (at least from the ones I know) own a desktop computer with a decent graphics card. For this reason I kinda wanted to get to a simple way to setup everything to run locally, without as much hassle as possible.

Now going into the specifics. Looking over TensorFlow, Keras, Theano, libgpuarray and other sites, it surprises me that they all have instructions for Ubuntu, macOS, Windows, and almost all of them are very specific (installing pip from specific url with tons of system libraries). This lead to incredibly frustration trying to setup everything Arch, Windows 10, Ubuntu 16.04 and OpenSUSE Tumbleweed at the same time (yeah I know my setup is complicated, but I can’t imagine everyone doing data science runs Ubuntu?). But to my surprise Miniconda/Anaconda seems to solve basically all of the issues, yes almost nobody mentions it.

Now this leads me to my problem. I’ve tried installing Tensorflow with GPU support on Python 2.7

conda create -n foo python=2.7 tensorflow-gpu keras

This sets up proper Python version, MKL, CUDA-Toolkit and cudNN in just a single line, without having to download/install CUDA libraries by hand, setting up path, or installing any extra system libraries.

Testing that it works with TensorFlow

$ KERAS_BACKEND=tensorflow python -c "from keras import backend"
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally

but if I try to do the same with Theano

$ KERAS_BACKEND=theano python -c "from keras import backend"                                             
Using Theano backend.
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 164, in <module>
    use(config.device)
  File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 151, in use
    init_dev(device)
  File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 60, in init_dev
    sched=config.gpuarray.sched)
  File "pygpu/gpuarray.pyx", line 614, in pygpu.gpuarray.init (pygpu/gpuarray.c:9415)
  File "pygpu/gpuarray.pyx", line 566, in pygpu.gpuarray.pygpu_init (pygpu/gpuarray.c:9106)
  File "pygpu/gpuarray.pyx", line 1021, in pygpu.gpuarray.GpuContext.__cinit__ (pygpu/gpuarray.c:13468)
GpuArrayException: Error loading library: -1

I also tried with explicit flags but getting the same output (note that I use device=cuda while the course suggests device=gpu, but if I use device=gpu I get a deprecation warning, so I’d expect that the current version of theano I have installed is newer?)

THEANO_FLAGS="cuda.root=$HOME/.miniconda/envs/foo,device=cuda,floatX=float32" KERAS_BACKEND=theano python -c "from keras import backend"

Now given that tensorflow works, I’d assume the problem is not in the CUDA drivers not being properly setup. I can also see the CUDA libraries under ~/.miniconda/envs/foo/lib Here’s also the output of nvidia-smi

$ nvidia-smi                                                                                                                  
Sat Apr 29 20:45:04 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 378.13                 Driver Version: 378.13                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960     Off  | 0000:03:00.0      On |                  N/A |
|  0%   36C    P8    12W / 130W |    172MiB /  1993MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     20870    G   /usr/lib/xorg-server/Xorg                      170MiB |
+-----------------------------------------------------------------------------+

Digging a little bit deeper, it seems the problem is with pygpu itself.

$ DEVICE="cuda0" python -c "import pygpu;pygpu.test()"                                                                                      
pygpu is installed in /home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/pygpu
NumPy version 1.12.1
NumPy relaxed strides checking option: True
NumPy is installed in /home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/numpy
Python version 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
nose version 1.3.7
EEEEEEE
======================================================================
ERROR: Failure: GpuArrayException (Error loading library: 0)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/pygpu/tests/test_basic.py", line 5, in <module>
    from .support import (gen_gpuarray, context)
  File "/home/darth/.miniconda/envs/foo/lib/python2.7/site-packages/pygpu/tests/support.py", line 32, in <module>
    context = gpuarray.init(get_env_dev())
  File "pygpu/gpuarray.pyx", line 614, in pygpu.gpuarray.init (pygpu/gpuarray.c:9415)
  File "pygpu/gpuarray.pyx", line 566, in pygpu.gpuarray.pygpu_init (pygpu/gpuarray.c:9106)
  File "pygpu/gpuarray.pyx", line 1021, in pygpu.gpuarray.GpuContext.__cinit__ (pygpu/gpuarray.c:13468)
GpuArrayException: Error loading library: 0

...

But at this point I really don’t know what to try. Ideally since miniconda already can get Tensorflow working with CUDA/cudNN I’d like to avoid installing system packages, and instead get things working through conda.

Any tips are appreciated

iNLyze · April 29, 2017, 8:56pm

How important is it to you to use Theano? There is an option to convert weights files from Theano to Tensorflow:

So, if it is “just” for running the course notebooks then it might be worth to convert the two vgg16 weights files and be done. Unless you have other projects which require Theano.

darthdeus · April 29, 2017, 9:00pm

Thanks for posting this, it’ll definitely come useful.

Though my end goal is not just to run the course notebooks, but to have a robust setup I’d like to have the option to run both Theano and TF, just because lots of examples online use them both (and because I kinda want to experiment with both too).

iNLyze · April 29, 2017, 11:42pm

Sure. There are other exciting options for libraries, too. One of the most exciting ones right now IMHO is pytorch. See part2 lesson 8 for details or just search for pytorch on the forum.
Lesson 8 discussion
Lesson 8 wiki

I am just mentioning it, because I find myself stretched between so many possibilities to dive into and mastering a library is certainly no little feat. Pytorch has some unique features, check it out

alexandrecc · May 5, 2017, 3:37pm

@darthdeus, I want to try a similar setup soon with Keras dual tensorflow/theano on windows 10. This kind of setup can potentially wake up a lot of not too old sleeping gpus with 2-3 gb of gpu ram to distribute hyperparameters testing on different machines.

According to https://www.tensorflow.org/install/install_windows, it says :
Native pip installs TensorFlow directly on your system without going through a virtual environment. Since a native pip installation is not walled-off in a separate container, the pip installation might interfere with other Python-based installations on your system. However, if you understand pip and your Python environment, a native pip installation often entails only a single command! Furthermore, if you install with native pip, users can run TensorFlow programs from any directory on the system.

In Anaconda, you may use conda to create a virtual environment. However, within Anaconda, we recommend installing TensorFlow with the pip install command, not with the conda install command.

From this, we can probably infer that the tensorflow windows installation can somehow also interfere with theano if they are in the same conda environment. Creating 2 different condas environment can probably solve the problem but it duplicates the maintenance. Did you try to pip install gpu-tensorflow after activating the conda environment with the command they suggest :
(tensorflow)C:> pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl

Or can it be related to tensorflow only being supported for python 3.5.x on windows (maybe because it interferes with theano with python 2.7 … ) ? Do you get the same problem with python 3.5 ?

xinxin.li.seattle · May 5, 2017, 4:03pm

have you tried
(tensorflow)C:> pip3 install --ignore-installed --upgrade

if you use pip install, you might end up with python 2.7 with tensorflow using cpu

alexandrecc · May 5, 2017, 4:29pm

@xinxin.li.seattle According to the name of the package tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl, I suppose it should install the gpu version but with python 3.5 …

xinxin.li.seattle · May 5, 2017, 4:35pm

whoops, you are right