How to use Multiple GPUs?

(Haider Alwasiti) #82

Yes. After creating the learn object put this line
learn.model = torch.nn.DataParallel(learn.model, device_ids=[0, 1])


(Haider Alwasiti) #83

I have edited my above answer. Forgot the learn.model =


(Shiv Gowda) #84

The above worked for me to use 2 GPUs on my desktop for 2 separate processes(devices 0 and 1). -Thanks.


(Hossain) #85

@hwasiti, when I am running
learn.model = torch.nn.DataParallel(learn.model, device_ids=[0, 1])
I get the following error

AttributeError: can’t set attribute

Can you please help?

Thanks in advance.


(Haider Alwasiti) #86

Can you share a snapshot of the code with the errors and few lines before this error? Where did you put this line exactly?

I have shared in my previous posts my github pet’s notebook. Can you try one of them and see whether it works for you?


(Hossain) #87

Thanks again. Can you please send me the link to your notebook again?

screenshot of my code is attached.


(Haider Alwasiti) #88

Look in cell number 32:

I have noticed that you have did this import:
import torch.nn as nn

Can you try this instead:
learn.model = nn.DataParallel(learn.model, device_ids=[0, 1])


(Hossain) #89

Thanks Haider. Afraid to say i am having same error. !


(Haider Alwasiti) #90

I am using fastai v1 and yours seems the older v0.7. I suspect that is the reason. Tried to check your code, but sorry to say I could not find anything wrong with your code. Maybe others with more knowledge can chime in for a solution.


(Hossain) #91

Thanks, Haider. I updated fastai to v1.0.0 but still get the same error. Yes, will be waiting for other’s reply.

when I am running
learn.model = torch.nn.DataParallel(learn.model, device_ids=[0, 1])
learn.model = nn.DataParallel(learn.model, device_ids=[0, 1])
I get the following error

AttributeError: can’t set attribute

Imported packages:

from fastai.imports import *
from fastai.torch_imports import *
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

Can you please help?

Thanks in advance.


(Haider Alwasiti) #92

Can you share your code? I will try to run it on my machine and see.

‘ConvLearner’ is no longer used in fastai v1. What is your exact version of fastai?

Can you please also try my notebook, which I can confirm it works fine.
Run Cell 1 to 11.
Then jump to cell 30 and run from cell 30 to 34.


(Hossain) #93

Please find my code from here

I am going to test yours.


(Haider Alwasiti) #94

I tried to run your code and it does not work on fastai v1.39

Please update your fastai to the latest version and try my code. Most likely then it will work

Your notebook seems from the older fastai and it is not compatible with fastai v1.39

conda install fastai -c fastai

1 Like

(Hossain) #95

Hi Haider,
Thanks for your advice. Happy to let you know that after upgrading fastai to v1, coding is working fine.

1 Like

(Armand Gerstenberger) #96

Running cifar example I seem to be stable at 68C (gpu: 1080) with driver 418.39. Note: computer near window in winter


(Dana Ludwig) #97

Haider, I read the Horovod link you gave, and it does look very promising! The enhancements you would need to add to a framework like fastai are minimal and completely generic, that is, not specific to one model architecture. They have a basic implementation for PyTorch on that web page, and it is only about 40 lines of code. The changes are at the init(), the model distribution, the optimizer (you wrap your optimizer in theirs; that’s where the gradients are shared, I guess), and distributing the parameters initially once. Your specific hardware configuration (multi-node, or one-node, multi-GPU) is passed in the training command line as just one line of parameters. Seems too good to be true :grinning:. All of that gives you 90% linear scaling even with multi-node (I assume infiniband interconnects). Horovod could be built in to fastaiV2 and nobody would notice (I hope), in the best case scenario!. We could have state-of-art performance with commodity gamer hardware.


(Haider Alwasiti) #98

Hi @danaludwig
Recently, there was an exciting progress in multi-node distributed data parallel training in fastai…

If you search the forum you will find some examples…

Particularly, see this Cifar training example that Jeremy, Sylvain, Andrew and Brett managed to smash everybody in April 2018 in the DAWNBench Stanford competition

More about it here

This forum thread related…

I am very excited about this to be already integrated in fastai :slight_smile:
Fastai is peerless!


(Dana Ludwig) #99

Hi @hwasiti,

After you have looked at Horovod, and you still think that our group can get scalable multiple GPU training faster, then I will take your word for it! I have read Jeremy’s DAWNBench blog and I just wasn’t sure whether it was a one-time model-specific AWS solution or it was something we can all use. I hope he will get into it more in the current class!

My problem is NLP and building a big fat language model faster, and I didn’t know how well the DAWNBench experience would generalize to the other models I think your last URL
ends on a very positive note where @kcturgutlu got his code to work and got linear scaling with # of GPU’s! My goal would be for a nice multiple GPU approach that handles 2-4 GPU’s on one node, and that doesn’t require a lot of detailed tweaking for each model.


(Haider Alwasiti) #100

I have read the blog post too… I think to win an international competition, one should use everything… Efficient distributed learning + the competition specific tweaks here and there… But I think that the distributed part of their solution is also so good that it contributed to their winning solution just as good as others or maybe better…

I haven’t tried distributed fastai yet, since my 2 pcs has 3+2 GPUs which is not encouraging for the efforts… If I had an opportunity to work on servers of 8 GPUs and more, I would think seriously on how to implement it… I think the 1st step for anybody interested, is to replicate the fastai example provided… Then if it worked tweak it into his own model pipeline… If worked, try to implement the Horovod method and compare… I would start with my research from the easiest path and go up…

Currently, I am using my multiple gpus with pytorch dataparallel, only to increase the possible model size and batch size. This is a little bit less efficient in training time, but it would give similar outcome… Or most of the time I just train (multiple models in parallel) one notebook on each gpu… Most of the time I need to tweak a lot of parameters and see which one is better, and this is speeding up my iterations… But I am still eager to try this distributed method on GCP nodes someday…


(Bryan Daniels) #101

Has anyone tried using NVLink with 2 TITAN RTX cards? From the NVIDIA Technical Specs, the NVLink appears to allow the use of two cards as if they were one, suggesting no change in code, nor need for explicit parallelism.
1 Connecting two TITAN RTX cards with NVLink to scale performance and memory capacity to 48 GB is only possible if your application supports NVLink technology. Please contact your application provider to confirm their support for NVLink. | 2 NVIDIA NVLink sold separately.

Not sure if NVLink is implemented in fastai and usable from Juptyer Notebook, i.e., not as a script. I found this blog suggesting that the 2nd card scales linearly, but using Caffe. NVLink Performance

If this has been answered elsewhere in the forums, I apologize in advance.