Set device context for entire session? (Training on multiple GPU's in parallel)

ostegm · April 17, 2017, 3:27am

I recently picked up a second 1080 used off craigslist and want to be able to run multiple experiments in parallel. To be specific, I’m not trying to run training on multiple GPU’s - I want to train different models on each GPU. I believe I have enough RAM (32gb) but I’m still running into resource exhausted issues with the tensorflow backend and I believe its because keras is trying to assign all the work to the default device (/gpu:0). I’ve done some digging but can’t seem to find a simple solution to setting the GPU to use for the whole session.

I’ve seen the code from tensorflow
And the description from Fchollet on how to set the device context for a layer
And I even found some suggestions on using environmental variables.

But all of these seem to suggest I need to write my entire python script within a device context…

with tf.device('/gpu:1'): all the rest of my code...

Is there a better way to set the device for the session?
If not, how do I do this without importing tf and running into resource exhausted errors when tf tries to use the GPU that is already maxed out training something else?

davecg · April 17, 2017, 5:31am

CUDA_VISIBLE_DEVICES=0 python myscript.py

(or =0,1 =1 etc. )

Can also set within notebook using %env or script using os.environ.

ostegm · April 18, 2017, 12:07am

Perfect, thanks!

davecg · April 18, 2017, 12:31am

No worries.

One caveat to let you know about: if you only set one device as visible, tensorflow will identify it as gpu:0 no matter what number it actually is.

I’m not sure what would happen if you flip the numbers (e.g. =1,0 instead of =0,1) but would not be shocked if that also flipped how TensorFlow numbered them.

lateralplacket · June 21, 2017, 6:58pm

I’m late to this thread but thought I’d add anyway: you say “my entire Python script” - but why is that a problem? In case it’s not obvious, you can just put “all the rest of my code…” in a function and call the function in the ‘with’ context.

ostegm · June 21, 2017, 7:34pm

Your answer is correct too, but I was working with the existing jupyter notebooks from class and wanted to avoid refactoring Jeremy’s notebooks. Also, its nice to avoid putting all of the code in the notebook in a single cell nested under the with context. For interactivity and exploration its nice to be able to set the device context and forget about it.