Hi,
Is there any simple flag in fasti ai / Torch library to use multiple GPUs?
Thanks,
Hi,
Is there any simple flag in fasti ai / Torch library to use multiple GPUs?
Thanks,
Not in fastai. Thereās DataParallel in pytorch, but we donāt currently support it.
Darn, I was wondering this as well.
I saw dataparallel. If we went through and added it to the library would if offer a significant boost for those with our own DL rigs? Assume it isnt a priority for those using AWS, Crestle and paperspace.
Personally I donāt think itās that useful - Iāve never found itās helped me. Reason is: I always have multiple experiments I want to run, and I also want a spare GPU for doing interactive stuff. My main rig has 4 GPUs. So I can run 3 experiments plus an interactive notebook.
Makes sense.
For others, here is the code to check which GPU you are using and set it to a different one in PyTorch
# See how many devices are around
torch.cuda.device_count()
# Set it to a particular device
torch.cuda.set_device(1)
# Check which device you are on
torch.cuda.current_device()
Have you tried using āDataParallelā in one of fastai notebooks ?
I just tried wrapping various parts of the fastai library code in nn.DataParallel and didnāt have any luck.
The last thing I tried was to modify an attribute of learner
from the LSTM notebook,
learner = md.get_model(...)
learner.models.model = nn.DataParallel(learner.models.model)
learner.fit(...)
That results in the error:
AttributeError: 'RNN_Encoder' object has no attribute 'hidden'
due to this line in RNN_Encoder.forward()
:
raw_output, new_h = rnn(raw_output, self.hidden[l])
It seems trivial according to the Pytorch tutorial but I couldnāt figure out how to add it. Maybe someone smarter than me can!
After todayās class hopefully youāll have enough information to do it
Out of curiosity, what type of temperatures do you see on your cards while running a āfit epochā, according to nvidia-smi? In my rig, in slot 1 which displays to the monitor i see ~40C while the card in slot 5 that is running the code sees just over 80C while running and will quickly drop below 60C when finished and both card will equalize after about 2 minutes.
My system, 1080ti, shows the single GPU reached a max of 82C, probably near 100% utilization.
Problems with Multiple GPUās
arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 3)
If anyone could suggest how to work within the current code base and limit torch (or python or the env) to using one card, I would appreciate the help.
Also, other ideas, would be appreciated.
Finally, for others trying to debug errors, I found it useful to not only remove the ādata/dogscats/tmp/ā directory, but also the ā~/.torch/modelā directory. Apparently, the resnet34 model, i.e., āresnet34-333f7ec4.pthā had become corrupted. I couldnāt replicate the bug until I had removed this and forced a fresh download by Pytorch.
UPDATE: I was able to modify the environmental variable CUDA_VISIBLE_DEVICES to selectively use only one gpu card or the other. In spite of limiting myself to one gpu, the computer still crashes when executing the code above. If it were a hardware problem with the card, I would have thought it would have been one or the other card.
RESOLUTION: After considerable time debugging this problem, I am embarrassed to say that this was indeed a hardware problem. Even though I had used the rig with both gpu for considerable periods of time (crypto mining) with no problem, apparently ākickingā in the deep learning algorithm created a power surge that forced a restart. Plugging the computer into another circuit solved the problem. Since this problem has nothing to do with deep learning, I was going to delete this post, but after all the time on this problem, if someone else is stuck, perhaps this will be of help.
Wondering how to set fastai to use a different GPU (if you have more than one)?
use:
os.environ[āCUDA_VISIBLE_DEVICESā] = ānā in your notebook
where n is the number (starting with zero for the first one)
this must be done BEFORE you import the fastai library.
e.g.
import os
os.environ[āCUDA_VISIBLE_DEVICESā] = '1ā
from fastai.conv_learner import *
Just pass an integer when we do model.cuda(0)
All,
So I know in the past, I once actually had the just the ConvLearner running on my dual GPU system utilizing both the cards. As a technical exercise and simply for the heck of it, Iāll give it another go. I also know that I used pytorchās DataParallel module to do so.
I will try to ensure compatibility for all executable actions in the current library. @jeremy Would you be amenable to merging such a change?
Thnx.
Absolutely!
I gave this a shot ā¦ I wrapped self.model with the nn.DataParallel(ā¦) in ConvnetBuilder, and began running lesson-1.ipynb. Kinda surprising that things just worked out the box. I had to do significant changes when I first attempted this way back.
class ConvnetBuilder():
"""Class representing a convolutional network...
"""
def __init__(self, f, c, is_multi, is_reg, ps=None, xtra_fc=None, xtra_cut=0):
.....
...
if f in model_meta: cut,self.lr_cut = model_meta[f]
self.top_model = nn.DataParallel(nn.Sequential(*layers)) # <---- (First attempt)
However, the runtimes are terrible. My runtimes actually went up by around a factor of 2 for finishing each epoch!! I could also see that my CPUās were being slightly less-utilized using parallelism. Could it be some kind of starving because of how CPUs are feeding images to the GPU?
Not that itās important! If any pointers pops immediately in your mind, I can look on itā¦ If not, Iāll do some other explorations.
Iāve not had luck improving performance with multiple GPUs either (not as bad results as you saw, but no faster than single GPU). I havenāt looked closely into it. Iād be interested to hear if you find out anything. Perhaps on the pytorch forums?
When I run the following code without any other jobs running, it is significantly slower than when the GPU is running other process. (Specifically, it is under heavy load running crypto mining software.) I have repeated the trials numerous times to make sure that there were no differences in pre-computing or caching taking place. Moreover, I have tested this off and on over several weeks with the same result. I have used nvidia-smi
to verifying what jobs are running on the GPU. Here are the times:
Time with no load: 45 seconds
Time with load: 20 seconds
arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 5)`
This really doesnāt make sense to me.
EDIT: I was wondering if someone could let me know how long the above code runs for them. (I am running this on a Titan V, though Iāve tested on a Titan X and itās about the same.) This is right out of Lesson1. Note that in learn.fit(0.01, 5), I am running 5 epochs.
Iāve also exprienced a similar issue when I used nn.DataParallel to run on 4 GPUs, it didnāt seem to help much in time. So Iāve increased batch sizes to the extend where all GPU memories were almost full to take full advantage of it since there might be some bottlenecks when we are splitting data and copying modules to all GPUs. Still not sure whether itās worth running nn.DataParallel. Itās probably better to stick with what Jeremy suggests and utilize GPUs for running different experiments.
Maybe plotting performance vs different batch sizes might give some clue about bottlenecks.
Itās definitely possible to get nearly linear scaling with more GPUs - I just havenāt looked in to how to make that work. But plenty of folks have published results showing that.