Something is wrong. Usually it take around 500 seconds to fit an epoch.
It’s weird that you have 100% GPU util but non of your GPU ram is being used.
Did you see any error or warning message while importing keras or theano?
You should see in the last line of nvidia-smi output that a python proccess is running and using the GPU.
When you first import theano/keras, are your receiving a message on the use of GPU? something like:
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
/home/ubuntu/anaconda2/lib/python2.7/site-packages/theano/sandbox/cuda/init.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
warnings.warn(warn)
Using Theano backend.
Otherwise you are running on the CPU and hence the long training time
I got some some error and the fitting is very slow like yours initially. My error is related to NVCC compiler and I fixed by following a stackover flow answer:
In current image I cloned from this course, it is cuda-8.0. So I added the following to ~/.theanorc:
[cuda]
root = /usr/local/cuda-8.0
Seem the error is fixed after this. The training of epoch=1 of total xxx/23000, it is shown ETA at about 700s -800s for the full 23000 samples.
Yeah, the “No running processes found” is strange if Prasad is currently training. I think the CPU might be training the model. For certain training tasks I’ve seen a CPU take 20 times longer than a GPU, and 500s x 20 == 10000s, which is close to 12000s.
Although the batch_size equaling 32 might mess with my numerology.
Ok I killed that instance and started a fresh instance (via DataBricks) but found the nvidia-sim output to be at 98% GPU Utilization right away!
So I’m not sure what’s going on.
And of course I repeated the training (with batch_size = 64) and it’s still super-slow, un-surprising given the 98% GPU utilization while doing “nothing”. I wonder if some Spark process is soaking up the CPU.
I don’t think @Nathaniel’s fix would help, it doesn’t seem related. The root problem seems to be that the GPU seems to be occupied to begin with.
A great lesson in just how much the GPU helps speed things up
Well I finally decided to let go of my attachment to a certain way of doing things and do it exactly like @howard’s setup video says: basically run setup_p2.sh from my Mac to spin up the pre-configured AMI. It works like a charm (and Vgg16 trains in ~8 mins not 4 hours) except that I had to do a couple of things:
My AWS account already has a certain number of VPCs and I had to request a limit-increase via he AWS console page
Once I ssh to the instance, it didn’t seem to have the unzip command, so I tried to do sudo apt install unzip and got a strange error
E: dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem.
``` and well, I ran the suggested command, which ran for about 5 minutes.
- Then I did `sudo apt update` and `sudo apt install zip unzip` to get those to work
- Then I did `git clone ...` to pull in the notebooks from the github repo
- I downloaded the data using `wget http://www.platform.ai/files/dogscats.zip` and ran `unzip` to extract it
- run `jupyter notebook` and went to the usual `...:8888` address and it all works fine
I mentioned in my comment that I used this dataset:
http://www.platform.ai/files/dogscats.zip
And incidentally the 4 hours training time was on a cluster spun up via the DataBricks, and a notebook in that environment, so my issue may have been specific for that environment, and may be hard for others to replicate.
The code is identical to the one in lesson1.ipynb, I am not doing anything different.
In any case as I said in my latest comment, when I instead follow your setup and use Jupyter notebooks, everything works fine.
I get the same ~ 4 hours ETA training time. I am using paperspace and the price is 0.65 $/ hour with GPU is P5000 which I think powerful enough. What did cause the slowness ? I am using notebook lesson 1 for training dogs vs cats