Hello everyone,
I was enjoying my little dual-boot Win10+Ubuntu with GTX 1080Ti server for the last 2 weeks until it became unstable this morning so I ran a bunch of “sudo apt-get install/update/upgrade”.
I can’t recall at what stage it went really wrong but suddenly I got flooded with pink-box messages, when starting notebooks, such as:
INFO (theano.gof.compilelock): Waiting for existing lock by process ‘2478’ (I am process ‘2680’)
INFO (theano.gof.compilelock): To manually release the lock, delete /home/eric/.theano/compiledir_Linux-4.8–generic-x86_64-with-debian-stretch-sid-x86_64-2.7.13-64/lock_dir
or
1 #define _CUDA_NDARRAY_C
2
3 #include <Python.h>
4 #include <structmember.h>
5 #include “theano_mod_helper.h”
6
7 #include <numpy/arrayobject.h>
8 #include
9
10 #include “cuda_ndarray.cuh”
11
12 #ifndef CNMEM_DLLEXPORT
13 #define CNMEM_DLLEXPORT
14 #endif
15
16 #include “cnmem.h”
17 #include “cnmem.cpp”
18
19 //If true, when there is a gpu malloc or free error, we print the size of allocated memory on the device.
20 #define COMPUTE_GPU_MEM_USED 0
21
22 //If true, we fill with NAN allocated device memory.
23 #define ALLOC_MEMSET 0
24
25 //If true, we print out when we free a device pointer, uninitialize a
26 //CudaNdarray, or allocate a device pointer
27 #define PRINT_FREE_MALLOC 0
28
29 //If true, we do error checking at the start of functions, to make sure there
30 //is not a pre-existing error when the function is called.
31 //You probably need to set the environment variable
32 //CUDA_LAUNCH_BLOCKING=1, and/or modify the CNDA_THREAD_SYNC
33 //preprocessor macro in cuda_ndarray.cuh
34 //if you want this to work.
35 #define PRECHECK_ERROR 0
36
37 cublasHandle_t handle = NULL;
38 int* err_var = NULL;
(…)
I did multiple reinstall of Theano + Keras + CUDA: no success.
Then I wiped out Anaconda2 entirely, using the “Anaconda-clean” package from
https://docs.continuum.io/anaconda/install
followed by a brutal “rm -rf ~/anaconda2”.
Did TWO complete reinstall using the super-practical “bash install-gpu.sh” from wiki.fast.ai
http://wiki.fast.ai/index.php/Ubuntu_installation
And more tweaking here and there.
Now I can run Lesson1 cell #7 again, the “state of the art custom model in 7 lines of code with one epoch of Vgg16”.
It is slower than before: 307 sec vs. 205 sec, at least it runs.
But I keep having a nasty cuDNN message at launch:
Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
/tmp/try_flags_JuwE3B.c:4:19: fatal error: cudnn.h: No such file or directory
compilation terminated.
Mapped name None to device cuda: GeForce GTX 1080 Ti (0000:01:00.0)
Anyone encountered that ?
Eric