I’ve created a learner and to attempt multi-gpu U-Net training I set the model to:
learn.model = torch.nn.DataParallel(learn.model)
Then I exported the model with
learn.export(), and imported the model on a different machine with
load_learner. I’m able to load the learner successfully, but when I try to call
learn.predict I get the error:
File “/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/_functions.py”, line 115, in _get_stream
if _streams[device] is None:
IndexError: list index out of range
Does anyone know what I need to modify in order to use
We haven’t experimented with DataParallel and exporting yet since most users use one GPU. Normally only the underlying model is saved with fastai, but you would probably be safer by undoing the DataParallel thing before exporting.
I think it’s done with
learn.model = learn.model.module.
Thanks, that worked! Didn’t realize it would be that simple
I had trained a model with multiple gpus using dataparallel. but when i want to use the weights to make prediction and generate heatmap with GRAD_CAM i got these issue “TypeError: ‘DataParallel’ object is not subscriptable” and i want to confirm that learn.predict working fine
You should use the fastai method
.to_parallel (or something like that). It will synchronize back your model on one GPU at the end of training and allow you to export.