What server has been used to produce the default nb outputs?

I’m a bit surprised to see that aws p2.xlarge is quite slow considering its high cost.
Currently working on the statefarm sample, I notice that each epoch takes 26~33s although the initial output was showing that each epoch ran in 11s.

I’ve checked that I’m really using gpu with theano’s script after changing cuda.use('gpu1')cuda.use('gpu0') (as shown by nvidia-smi, there’s only one gpu available on p2.xlarge).

Therefore, I’m curious as to what has been used when building the course. Anyone can shed some light on it?


Actually, when importing cuda, I can see that the nb was produced using a GeForce GTX TITAN X, while aws p2 uses a Tesla K80.

This makes even less sense after reading a comparative between the 2, which puts the Tesla a level above the geforce, with twice more memory.

Now, I really think that my config is wrong… for information, here is my current .theanorc:

device = gpu
floatX = float32

root = /usr/local/cuda-8.0


What about your perfs on the p2.xlarge ? Any idea on how to improve the computing time ?

Do you mean p2? t2 is only CPU and is considerably slower than GPU.

typo fixed, I’m talking about p2 sorry

You might not have it setup properly then.
What is the line you get when you import Theano?

Using GPU device 0: GeForce GTX 1080 (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 5110)

This will tell you if the GPU is enabled and if it picks up cuDNN. cuDNN will improve performance significantly.

I get this:

Using gpu device 0: Tesla K80 (CNMeM is enabled with initial size: 75.0% of memory, cuDNN 5103)
/home/ubuntu/anaconda2/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.

That looks good, you can up the memory to 80-90% in the Theano config but that won’t make a huge difference.

I would recommend running GPUTest and see how you score, I don’t know how a K80 would score but if I was to guess around 3000-4000. If you get significantly lower, it is likely something isn’t right.

Can you run lesson 1, first fit and see what you get for time. p2 instance should give you around 650s per epoch.

@dradientgescent can you explain or provide a link on how to run GPUTest?


FYI Titan X is about twice as fast as the P2, on single precision (which is what we use).

1 Like

650s per epoch meaning for 5 epochs we are getting close to an hour… for a single fit after unfreeze?
How much time should the dog breeds experiment take,there are several fits there…?
I’m running on p2 and I see that the GPU is busy performing the task when I type nvidia-smi. I haven’t measured timings yet, because I was sure it’s supposed to be several minutes and not hours…