Exporting learner with nn.DataParallel

austinmw · March 5, 2019, 4:20pm

I’ve created a learner and to attempt multi-gpu U-Net training I set the model to:

learn.model = torch.nn.DataParallel(learn.model)

Then I exported the model with learn.export(), and imported the model on a different machine with load_learner. I’m able to load the learner successfully, but when I try to call learn.predict I get the error:

File “/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/_functions.py”, line 115, in _get_stream
if _streams[device] is None:
IndexError: list index out of range

Does anyone know what I need to modify in order to use nn.DataParallel with learn.export and load_learner?

sgugger · March 5, 2019, 4:34pm

We haven’t experimented with DataParallel and exporting yet since most users use one GPU. Normally only the underlying model is saved with fastai, but you would probably be safer by undoing the DataParallel thing before exporting.
I think it’s done with learn.model = learn.model.module.

austinmw · March 5, 2019, 5:02pm

Thanks, that worked! Didn’t realize it would be that simple

Hamada · August 21, 2019, 3:07pm

I had trained a model with multiple gpus using dataparallel. but when i want to use the weights to make prediction and generate heatmap with GRAD_CAM i got these issue “TypeError: ‘DataParallel’ object is not subscriptable” and i want to confirm that learn.predict working fine

sgugger · August 22, 2019, 8:48am

You should use the fastai method .to_parallel (or something like that). It will synchronize back your model on one GPU at the end of training and allow you to export.