multi-GPU error: all tensors must be on devices[0]

Serhii · March 19, 2019, 1:07pm

Hello,
I am trying to use all GPUs with RNN text classification model using
learn.model = torch.nn.DataParallel(learn.model, device_ids=[0,1,2,3,4,5,6]) and making sure pytorch and cuda device numbering stay the same with export CUDA_DEVICE_ORDER=PCI_BUS_ID
Even after explicitly using set_device(0) I still get Error: all tensors must be on devices[0]
I can see all devices using pytorch and nvidia-smi.
Any suggestions would be most appreciated, thanks!

sgugger · March 19, 2019, 1:14pm

You probably need to unwrap your model from DataParallel by calling learn.model = learn.model.module.