I’m getting tired of waiting 5+ minutes to train each epoch on an AWS p2.xlarge instance. Has anyone tried using the p2.8xlarge instances? These have 8 GPUs and 8 times the memory. Should I expect to see an 8x speedup, if I can multiply my batch_size by 8 when running fit()?
Any help or advice on speeding up the training process would be greatly appreciated! (I’m training on the State Farm distracted driver competition right now.)
The discussion on Keras in issues says that training the same model across several gpus is an open research problem.
" It is not a limitation of Keras. This is how deep learning works. Unless you want to get VERY researchy, you have to choose data parallelism or model parallelism (or a combination). No backend change could fix that"
If one choose to train multiple models and then take the average of their predictions as mentioned in lesson 3, wouldn’t it be at least theoretically possible to assign each model to a different GPU in order to speed up the overall process ?
About training the same model across multiple GPUs, would it be possible to use the dropout layers to help ?
What I mean is that if we consider a layer i followed by a dropout layer i + 1 which drops more than 50% of the activations, could we launch 2 trainings of the following layers in parallel since the SGD would update different sets of weights in the i-th layer ?
from keras.applications.vgg16 import VGG16
from multi_gpu import make_parallel
model = VGG16(include_top=False, weights=‘imagenet’, input_tensor=None, input_shape=(224, 224, 3))
model = make_parallel(model, 8)#to use 8 GPU
I didn’t get x8 improvement though, only 5x-6x with this approach
has anyone tried training different models across different p8 gpus using Theano’s backend yet? Would you be willing to share the recipe?
Alternatively, I could switch to TensorFlow’s backend in anticipation of the Deep Learning course pt2. However, I had tried to switch to TF as an experiment (and installed TF and configured Keras was fine), but the same model was running slower, b/c TensorFlow wasn’t using the GPU on the instance. Wondering what I was missing about implementing TensorFlow on AWS.
Hi, I’m curious what’s your version of keras and tensorflow. I tried this script, but can not make it work with more than 2 gpus(which will make the terminal unresponding and then died), I cannot find any clue from the internet. Thanks.
I’ve experienced multiple times memory allocation errors with Tensorflow on a simple P2 instance.
Currently I am training ~7000 images, which takes 30min per Epoch.
I was wondering - how hard is it to use multiple GPUs (8x or 16x?)
Does someone has experience with it? @chianti Do you have more experience since January?
Do I really add just 1 line?
Does this 1 line solves my memory allocation errors?