Set device context for entire session? (Training on multiple GPU's in parallel)

I recently picked up a second 1080 used off craigslist and want to be able to run multiple experiments in parallel. To be specific, I’m not trying to run training on multiple GPU’s - I want to train different models on each GPU. I believe I have enough RAM (32gb) but I’m still running into resource exhausted issues with the tensorflow backend and I believe its because keras is trying to assign all the work to the default device (/gpu:0). I’ve done some digging but can’t seem to find a simple solution to setting the GPU to use for the whole session.

But all of these seem to suggest I need to write my entire python script within a device context…

with tf.device('/gpu:1'): all the rest of my code...

  • Is there a better way to set the device for the session?
  • If not, how do I do this without importing tf and running into resource exhausted errors when tf tries to use the GPU that is already maxed out training something else?

CUDA_VISIBLE_DEVICES=0 python myscript.py

(or =0,1 =1 etc. )

Can also set within notebook using %env or script using os.environ.

3 Likes

Perfect, thanks!

No worries.

One caveat to let you know about: if you only set one device as visible, tensorflow will identify it as gpu:0 no matter what number it actually is.

I’m not sure what would happen if you flip the numbers (e.g. =1,0 instead of =0,1) but would not be shocked if that also flipped how TensorFlow numbered them.

I’m late to this thread but thought I’d add anyway: you say “my entire Python script” - but why is that a problem? In case it’s not obvious, you can just put “all the rest of my code…” in a function and call the function in the ‘with’ context.

Your answer is correct too, but I was working with the existing jupyter notebooks from class and wanted to avoid refactoring Jeremy’s notebooks. Also, its nice to avoid putting all of the code in the notebook in a single cell nested under the with context. For interactivity and exploration its nice to be able to set the device context and forget about it.

1 Like