How to use multiple gpus

I am unable to figure out how to use multiple gpus. Simply using model=torch.nn.Dataparallel(model), and then using this model in learner gives errors:, line 22, in loss_batch
 out = model(*xb)
return type(out)(map(gather_map, zip(*outputs)))
TypeError: zip argument #1 must support iteration

How is structured your code ? Your code is working without the Dataparallel wrapper ?

I don’t think any of us have tried multi-gpu training yet with fastai v1 - at least no with nn.DataParallel. So there will probably be some code changes required. We’re unlikely to do them until after the course, so feel free to have a go yourself if you’re interested; and let us know if you try and get stuck at any point. Using pdb is probably going to help you figure out where the issues are.


Sure, will keep you updated. I am guessing there should be some neat trick using callbacks though. Just need to figure that out.

@jeremy Turns out multi-gpu works perfectly with fastai. All one has to do is model=torch.nn.DataParallel(model) and then pass this model to the Learner object.

The error was because I was passing a list instead of tensors which was causing the error reported.


I tried to do this on lesson 1 of course-v3 but it didn’t work. Here’s what i did:

models_resnet34 = torch.nn.DataParallel(models.resnet34)
learn = ConvLearner(data, models_resnet34, metrics=error_rate)


AttributeError                            Traceback (most recent call last)
<ipython-input-17-00b7619ab140> in <module>
      1 models_resnet34 = torch.nn.DataParallel(models.resnet34)
----> 2 learn = ConvLearner(data, models_resnet34, metrics=error_rate)

~/venv-py36/lib64/python3.6/site-packages/fastai/vision/ in __init__(self, data, arch, cut, pretrained, lin_ftrs, ps, custom_head, split_on, **kwargs)
     52         meta = model_meta.get(arch, _default_meta)
     53         torch.backends.cudnn.benchmark = True
---> 54         body = create_body(arch(pretrained), ifnone(cut,meta['cut']))
     55         nf = num_features(body) * 2
     56         head = custom_head or create_head(nf, data.c, lin_ftrs, ps)

~/venv-py36/lib64/python3.6/site-packages/torch/nn/modules/ in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

~/venv-py36/lib64/python3.6/site-packages/torch/nn/parallel/ in forward(self, *inputs, **kwargs)
    140         if len(self.device_ids) == 1:
    141             return self.module(*inputs[0], **kwargs[0])
--> 142         replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
    143         outputs = self.parallel_apply(replicas, inputs, kwargs)
    144         return self.gather(outputs, self.output_device)

~/venv-py36/lib64/python3.6/site-packages/torch/nn/parallel/ in replicate(self, module, device_ids)
    146     def replicate(self, module, device_ids):
--> 147         return replicate(module, device_ids)
    149     def scatter(self, inputs, kwargs, device_ids):

~/venv-py36/lib64/python3.6/site-packages/torch/nn/parallel/ in replicate(network, devices, detach)
      9     num_replicas = len(devices)
---> 11     params = list(network.parameters())
     12     param_indices = {param: idx for idx, param in enumerate(params)}
     13     param_copies = Broadcast.apply(devices, *params)

AttributeError: 'function' object has no attribute 'parameters'
1 Like

I managed to mitigate this by doing this:

learn = ConvLearner(data, models.resnet34, metrics=error_rate)
learn.model = torch.nn.DataParallel(learn.model)

That’s the proper way to do it, since it’s the learner that puts the model on the GPU(s).


I used @TheShadow29 instructions and set learn.model = torch.nn.DataParallel(learn.model) . The learner I’m using is RNNLearner.language_model. I’m building a language model on a custom dataset and the training time for the parallel version reduced to 6 hours and 43 mins compared to over 10 hours for the non-parallel version. Everything went smoothly except when I tried to save the encoder:


TypeError Traceback (most recent call last)
1 learn.fit_one_cycle(5, 1e-3)
----> 3 learn.save_encoder(‘ft_enc’)

~/fastai/fastai/text/ in save_encoder(self, name)
53 def save_encoder(self, name:str):
54 “Save the encoder to name inside the model directory.”
—> 55[0].state_dict(), self.path/self.model_dir/f’{name}.pth’)
57 def load_encoder(self, name:str):

TypeError: ‘DataParallel’ object does not support indexing

Please note that the model got saved without any problems. The problem is here:[0].state_dict(), self.path/self.model_dir/f'{name}.pth')

I can vaguely understand the problem, but I’m wondering if there is way to extract the encoder from the model in some other way. I’m going keep that notebook open until I can say whether it can or can’t :slight_smile: to save 7 hours of training.


DataParallel adds a new wrapper around your model and every group will have a new prefix ‘module’. There is some indications here advising to save your model (‘name’)) then

# original saved file with DataParallel
state_dict = torch.load('myfile.pth')
# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
    name = k[7:] # remove `module.`
    new_state_dict[name] = v
# load params

Thank you for your answer.

How does this save the encoder? Because, as I mentioned earlier after training calling'latest-run) executed without any problem and the model latest-run.pth got saved in the path. The error TypeError: 'DataParallel' object does not support indexing occurred when I tried to save only the encoder learn.save_encoder('ft_enc) .

Isn’t the encoder a subset of the entire model? As such the “length” of the OrderedDict for the encoder would be less than that of the full model since the encoder is only saves some of the layers and not the entire network. I could be totally wrong about this and if I am I would learn something today :slight_smile: .

When I ran your code (and as your code indicates), it only removes the subword “module” from the keys and everything else remains the same. Is that the encoder? Or do I have to do this for the full model as well despite it being already saved?

Thank you.

This will give you the model without all the ‘module’, you can load it in a new learner without DataParallel then save your encoder.
Or you can adapt the script to only keep the encoder part.

1 Like

Thank you @sgugger . I did as you instructed and it worked. So just to recap (in case other people find it helpful), to train the RNNLearner.language_model with FastAI with multiple GPUs we do the following:

  1. Once we have our learn object, parallelize the model by executing learn.model = torch.nn.DataParallel(learn.model)
  2. Train as instructed in the docs
  3. Save the parallelized model'name')
  4. As instructed by @sgugger here, load the saved model, and strip out the word module from the keys of the dictionary and create and new state_dict
  5. Load the newly created state_dict into the model, which will now be unparallized
  6. Save both the model and encoder by executing'name-unpr') and learn.save_encoder('name').

I parallelized my model using

learn.model = nn.DataParallel(learn.model, device_ids=[0, 1])

However, during training with learn.fit_one_cycle(1, 1e-2) I get RuntimeError: CUDA error: out of memory at the same batch size (2048) that I successfully use for single-GPU computations. When I reduce my batch size by half (down to 1024), fitting works, but at twice the lower speed as before at the same batch size of 1024:

  • At bs = 1024
    • two GPUs: 2.98 s ± 193 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    • single GPU: 1.68 s ± 70.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  • At bs = 2048
    • two GPUs: RuntimeError: CUDA error: out of memory
    • single GPU: 1.66 s ± 71.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

It looks to me that using two GPUs, then, is not beneficial, at least in my case.

This is not working for me. I did the same for ConvLearner.Pretrained api and got the error on the second line stating cannot set the parameter. Can you please look into it? version 0.7

its not work for me.

learn.model = nn.DataParallel(learn.model) or
learn.model = nn.DataParallel(learn.model, device_ids=[0, 1])

that give me error like this.

'DataParallel' object has no attribute 'reset'

fastai v1.0.33

Yes, multi-GPU doesn’t work (yet) with RNNs. This is a standing issue we will address after the course is finished.

thanks for your answer and your time sgungger.

I found an alternate way to remove DataParallel from a saved learner. I have no idea if this a legitimate way to go about this, but none of the methods in this thread, e.g., modifying the state_dict, were working for me.

Also, somehow I had managed to save the learner.pth file with DataParallel wrapped around it twice, so to load it I needed to call

learn.model = torch.nn.DataParallel(learn.model)

twice, though this works without the double wrap.



showed that DataParallel was indeed wrapping the model, but I could access the non-parallel part with

learn.model.module.module (you may omit the second .module if its not there twice), so I tried:

learn.model = learn.model.module.module

and got back the unwrapped model which then functioned as normal.

learn.get_preds() worked like it should on one gpu.

I do not know if this has any side-effects, hopefully someone who knows something can comment.

Thanks for the amazing resources!!

1 Like

When training CNN on multiple GPU via
learn = torch.nn.DataParallel(learn, device_ids=[0, 1])

After savingthe model and then loading the model for prediction I am getting this error:

AttributeError: ‘DataParallel’ object has no attribute ‘load’
How should I solve this issue?