Is there any simple flag in fasti ai / Torch library to use multiple GPUs?
Is there any simple flag in fasti ai / Torch library to use multiple GPUs?
Not in fastai. There’s DataParallel in pytorch, but we don’t currently support it.
Darn, I was wondering this as well.
I saw dataparallel. If we went through and added it to the library would if offer a significant boost for those with our own DL rigs? Assume it isnt a priority for those using AWS, Crestle and paperspace.
Personally I don’t think it’s that useful - I’ve never found it’s helped me. Reason is: I always have multiple experiments I want to run, and I also want a spare GPU for doing interactive stuff. My main rig has 4 GPUs. So I can run 3 experiments plus an interactive notebook.
For others, here is the code to check which GPU you are using and set it to a different one in PyTorch
# See how many devices are around torch.cuda.device_count() # Set it to a particular device torch.cuda.set_device(1) # Check which device you are on torch.cuda.current_device()
Have you tried using ‘DataParallel’ in one of fastai notebooks ?
I just tried wrapping various parts of the fastai library code in nn.DataParallel and didn’t have any luck.
The last thing I tried was to modify an attribute of
learner from the LSTM notebook,
learner = md.get_model(...) learner.models.model = nn.DataParallel(learner.models.model) learner.fit(...)
That results in the error:
AttributeError: 'RNN_Encoder' object has no attribute 'hidden'
due to this line in
raw_output, new_h = rnn(raw_output, self.hidden[l])
It seems trivial according to the Pytorch tutorial but I couldn’t figure out how to add it. Maybe someone smarter than me can!
After today’s class hopefully you’ll have enough information to do it
Out of curiosity, what type of temperatures do you see on your cards while running a “fit epoch”, according to nvidia-smi? In my rig, in slot 1 which displays to the monitor i see ~40C while the card in slot 5 that is running the code sees just over 80C while running and will quickly drop below 60C when finished and both card will equalize after about 2 minutes.
My system, 1080ti, shows the single GPU reached a max of 82C, probably near 100% utilization.
Problems with Multiple GPU’s
arch=resnet34 data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz)) learn = ConvLearner.pretrained(arch, data, precompute=True) learn.fit(0.01, 3)
If anyone could suggest how to work within the current code base and limit torch (or python or the env) to using one card, I would appreciate the help.
Also, other ideas, would be appreciated.
Finally, for others trying to debug errors, I found it useful to not only remove the ‘data/dogscats/tmp/’ directory, but also the ‘~/.torch/model’ directory. Apparently, the resnet34 model, i.e., ‘resnet34-333f7ec4.pth’ had become corrupted. I couldn’t replicate the bug until I had removed this and forced a fresh download by Pytorch.
UPDATE: I was able to modify the environmental variable CUDA_VISIBLE_DEVICES to selectively use only one gpu card or the other. In spite of limiting myself to one gpu, the computer still crashes when executing the code above. If it were a hardware problem with the card, I would have thought it would have been one or the other card.
RESOLUTION: After considerable time debugging this problem, I am embarrassed to say that this was indeed a hardware problem. Even though I had used the rig with both gpu for considerable periods of time (crypto mining) with no problem, apparently “kicking” in the deep learning algorithm created a power surge that forced a restart. Plugging the computer into another circuit solved the problem. Since this problem has nothing to do with deep learning, I was going to delete this post, but after all the time on this problem, if someone else is stuck, perhaps this will be of help.
Wondering how to set fastai to use a different GPU (if you have more than one)?
os.environ[‘CUDA_VISIBLE_DEVICES’] = ‘n’ in your notebook
where n is the number (starting with zero for the first one)
this must be done BEFORE you import the fastai library.
os.environ[‘CUDA_VISIBLE_DEVICES’] = '1’
from fastai.conv_learner import *
Just pass an integer when we do
So I know in the past, I once actually had the just the ConvLearner running on my dual GPU system utilizing both the cards. As a technical exercise and simply for the heck of it, I’ll give it another go. I also know that I used pytorch’s DataParallel module to do so.
I will try to ensure compatibility for all executable actions in the current library. @jeremy Would you be amenable to merging such a change?
I gave this a shot … I wrapped self.model with the nn.DataParallel(…) in ConvnetBuilder, and began running lesson-1.ipynb. Kinda surprising that things just worked out the box. I had to do significant changes when I first attempted this way back.
class ConvnetBuilder(): """Class representing a convolutional network... """ def __init__(self, f, c, is_multi, is_reg, ps=None, xtra_fc=None, xtra_cut=0): ..... ... if f in model_meta: cut,self.lr_cut = model_meta[f] self.top_model = nn.DataParallel(nn.Sequential(*layers)) # <---- (First attempt)
However, the runtimes are terrible. My runtimes actually went up by around a factor of 2 for finishing each epoch!! I could also see that my CPU’s were being slightly less-utilized using parallelism. Could it be some kind of starving because of how CPUs are feeding images to the GPU?
Not that it’s important! If any pointers pops immediately in your mind, I can look on it… If not, I’ll do some other explorations.
I’ve not had luck improving performance with multiple GPUs either (not as bad results as you saw, but no faster than single GPU). I haven’t looked closely into it. I’d be interested to hear if you find out anything. Perhaps on the pytorch forums?
When I run the following code without any other jobs running, it is significantly slower than when the GPU is running other process. (Specifically, it is under heavy load running crypto mining software.) I have repeated the trials numerous times to make sure that there were no differences in pre-computing or caching taking place. Moreover, I have tested this off and on over several weeks with the same result. I have used
nvidia-smi to verifying what jobs are running on the GPU. Here are the times:
Time with no load: 45 seconds
Time with load: 20 seconds
arch=resnet34 data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz)) learn = ConvLearner.pretrained(arch, data, precompute=True) learn.fit(0.01, 5)`
This really doesn’t make sense to me.
EDIT: I was wondering if someone could let me know how long the above code runs for them. (I am running this on a Titan V, though I’ve tested on a Titan X and it’s about the same.) This is right out of Lesson1. Note that in learn.fit(0.01, 5), I am running 5 epochs.
I’ve also exprienced a similar issue when I used nn.DataParallel to run on 4 GPUs, it didn’t seem to help much in time. So I’ve increased batch sizes to the extend where all GPU memories were almost full to take full advantage of it since there might be some bottlenecks when we are splitting data and copying modules to all GPUs. Still not sure whether it’s worth running nn.DataParallel. It’s probably better to stick with what Jeremy suggests and utilize GPUs for running different experiments.
Maybe plotting performance vs different batch sizes might give some clue about bottlenecks.
It’s definitely possible to get nearly linear scaling with more GPUs - I just haven’t looked in to how to make that work. But plenty of folks have published results showing that.