VGG16 Training Very Slow on GTX 1080

nyxynyx · January 7, 2017, 5:38am

I’m trying to save cash by using my own system to work through the course. It has GTX 1080, 16GB DDR4, i7 5830K @ 4.5 GHz.

The following code block took my system 850 secs to run using "data/dogscats/sample/" . Is this normal?

vgg = Vgg16()
# Grab a few images at a time for training and validation.
# NB: They must be in subdirectories named based on their category
batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=1)

j.laute · January 7, 2017, 5:14pm

I think this is not normal. I use a Gtx 980 and get much faster times. You should check keras is using theano.

Look in your home directory for a “.keras” folder (note that your file manager might not show you folders starting with a dot by default) . In it you will find the keras configuration json file. Make sure you set Theano for both relevant entries
Read about the keras configuration here:
https://keras.io/backend/#kerasjson-details

If everything is set correctly and you still have bad performance check if Theano is using the gpu for its computations.
This page shows you how to check:
http://deeplearning.net/software/theano/tutorial/using_gpu.html

Hope I could help you

nyxynyx · January 7, 2017, 6:15pm

Thanks for pointing those out.

This is my keas.json, which seems properly configured for theano

{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}

Running the test code shows that its using CPU!

[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 9.771000 seconds
Result is [ 1.2318  1.6188  1.5228 ...,  2.2077  2.2997  1.6232]
Used the cpu

After creating .theanorc on Windows 10 system, I’m getting the error when importing theano:

WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu1 is not available  (error: cuda unavailable)
Using Theano backend.

This is my .theanorc

[global]
device = gpu
floatX = float32

[nvcc]
flags = --use-local-env  --cl-version=2008

[cuda]
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0

Still figuring this out.

j.laute · January 7, 2017, 6:47pm

OK,

Just took a quick look, will look at the error in about half an hour.

Just as a side note I really recommend using Linux for machine learning as many things work better there. Ubuntu should be fine(I’m using centos and am superior happy)

As for your error message, try using gpu0 as device not gpu1. I’m not an expert in these matters but afaik are gpu device names always starting at 0, gpu1 refers to your second gpu (as far I understand you only have one)

Cheers

j.laute · January 7, 2017, 7:24pm

Just took a look,

the theano.sandbox error should be fixed if you set it gpu1 to gpu0 in the notebook cell.

the first error is a much bigger problem. It seems like the c++ compiler is not installed, so the python code from keras cannot be compiled down to c++. This severly impacts the performance as the error message says.

You shoud try to follow these instructions and make sure you install the Microsoft Visual C++ Compiler for Python 2.7.
It seems if these are correctly installed the error should be solved.

http://deeplearning.net/software/theano/install_windows.html

Cheers

nyxynyx · January 7, 2017, 7:39pm

Trying to solve the second error first…

import theano.sandbox.cuda
theano.sandbox.cuda.use("gpu0")

produced

WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu0 is not available  (error: cuda unavailable)
In [5]:

and

import theano.sandbox.cuda
theano.sandbox.cuda.use("gpu1")

produced

WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu1 is not available  (error: cuda unavailable)
In [5]:

I also changed .theanorc to

[global]
device = gpu0
floatX = float32

[nvcc]
flags = --use-local-env  --cl-version=2008

but still Used the cpu when running the test code from http://deeplearning.net/software/theano/tutorial/using_gpu.html

nyxynyx · January 7, 2017, 7:51pm

Attempting to solve the first problem… I had previously installed Microsoft Visual C++ Compiler for Python 2.7, which is located at C:\Users\me\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\bin

So I tried changing .theanorc to

[global]
device = gpu0
floatX = float32

[nvcc]
compiler_bindir=C:\Users\me\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\bin

But this still gives the same error:

WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu0 is not available  (error: cuda unavailable)

Next I installed Microsoft Visual Studio Express 14.0 Community, and updated .theanorc to

[global]
device = gpu0
floatX = float32

[nvcc]
compiler_bindir=C:\Users\me\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\bin

But the same error persists…

If I run python in the command line and run import theano, I get a long error message, including this line

nvcc fatal   : Host compiler targets unsupported OS.

Any ideas please?

nyxynyx · January 7, 2017, 7:54pm

This is my nvidia-smi output in case its useful

Sat Jan 07 14:54:00 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 369.30                 Driver Version: 369.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080   WDDM  | 0000:02:00.0      On |                  N/A |
| 33%   33C    P8     9W / 200W |    559MiB /  8192MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

carlosdeep · January 7, 2017, 8:21pm

Hi,
It seems to have an inconsistency between CUDA 8.0 and VS 2015 that is causing some issues to put gpu to work. I also had problems but got fixed after check this thread. If the exact suggestions does not work, then try to run by hand the Visual Studio bat file and if that works, then check the differences between your regular environment and the one after but visual studio bat file. Then you can get the differences in the environment and set permanently using the Control Panel.
On time: my cpu now is working fine but because it is old, GT 630M, I will need to use AWS, at least until I upgrade my computer.
Hope this help.