How to use multiple gpus

#21

Your first line is incorrect, it should be:

learn.model = torch.nn.DataParallel(learn.model, device_ids=[0, 1])
0 Likes

#22

Thanks for your reply. this works now but I cannot save the model that is being trained via DataParallel. what am I doing wrong here? :

mName_S = str('bestModel_+ aName)
learnS = create_cnn(data, arch,
metrics=[accuracy, error_rate],
callback_fns=[partial(CSVLogger, filename =str(‘stat_’ +str(tr)+‘S’+ aName)), ShowGraph,
partial(SaveModelCallback, monitor =‘val_loss’, mode =‘auto’, name = mName_S )])

learnS.model=torch.nn.DataParallel(learnS.model, device_ids=[0,1])

learnS.load(mName_S)
log_preds, y_true = learnS.TTA()
y_true = y_true.numpy()
y_preds = np.argmax(np.exp(log_preds), axis=1)

What I ideally want to do is to train a model on multiple GPU, then save it, and later on be able to load it and predict some data via TTA.

0 Likes

#23

We can’t help without seeing the full error message and your version of fastai.

0 Likes

#24

hi @sgugger

this is the error it generates:

Traceback (most recent call last):
File “GpyCode.py”, line 85, in
learnS.load(m)
File “/homes/…/python3.6/site-packages/fastai/basic_train.py”, line 217, in load
state = torch.load(self.path/self.model_dir/f’{name}.pth’, map_location=device)
File “/homes/…/python3.6/site-packages/torch/serialization.py”, line 365, in load
f = open(f, ‘rb’)
FileNotFoundError: [Errno 2] No such file or directory: ‘data/LC_B_5/models/bestModel_5_S__resnet101.pth’

it says it cannot load the trained model, however this line of code :

partial(SaveModelCallback, monitor =‘val_loss’, mode =‘auto’, name = mName_S )])

which is responsible for saving the trained model is not saving the model neither generates any error message while in single GPU and multi CPU mode it saves the trained model perfectly and therefore load it without problem.

Thanks

0 Likes

(Karl) #25

I’m looking into training a language model on a fairly large dataset and I’m hoping to use multiple GPUs. Is it still the case that RNNs don’t work with multi-GPU?

0 Likes

#26

No it works properly. It’s the SaveModelCallback that might have some problem.

1 Like

(Sooraj Mangalath Subrahmannian) #27

Hi
I am training ULMFiT on multiple GPUs. I am facing this issue that my GPUs usage is heavily imbalanced.

.

System configurations:
Used fastai and torch version:
1.0.51.dev0 1.0.1.post2

Here is the code:
save_model = partial(SaveModelCallback,
monitor=‘accuracy’,
every=‘improvement’,
name=‘best_lm’)

early_stop = partial(EarlyStoppingCallback,
monitor=‘val_loss’,
min_delta=0.01,
patience=2)

lm_learner = language_model_learner(data_lm, ARCHITECTURE,
drop_mult = DROP_OUT_MULTIPLIER,
callback_fns = [early_stop, save_model])
lm_learner= lm_learner.to_parallel()
lm_learner.freeze_to(-1)
lm_learner.fit_one_cycle(cyc_len = 1,
max_lr = 0.04,
moms = (0.8, 0.7))

0 Likes

#28

I have the same problem as @soorajviraat i.e. when training language models only one gpu memory is fully utilized (in the case of CNNs everything is working correctly). I’ve run some benchmarks. Here are the results:

CNN

ARCH, GPU type, Dataset

note: jeremy suggested that limiting factor might be loaders and to use torchvision to fix it. It’s an old post. I don’t know if it the problem was already addressed so I’m planning to check it and add the results to this post

Resnet34, 4xK80 GPU, FastAI PETS

  • 16 : single_GPU/parallel, 0:58/1:20 min each, 4:00 min total, valid loss single/par 0.21/0.27
  • 64 : single_GPU/parallel, 0:46/0:32 min each, 2:30 min total, valid loss single/par 0.21/0.21
  • 256 : single_GPU/parallel/fp+p, 0:50/0:27/0:26 min each, 2:00 min total, valid loss single/par/p+fp 0.23/0.24/0.25
  • 1024 does not fit to single GPU; par/par+par+fp 0:40/0:39 valid loss 0.359/0.36

Language Model (RNN)

GPU type, Dataset

4xK80 GPU, FastAI idbm

AWD_LSTM

  • bs 48 singleGPU/parallel/fp/p+fp 1:24:00/1:23:00/1:54:20/1:00:00
  • bs 96 singleGPU/fp/p+fp n.a./1:03:00/1:03:00
  • bs 136 par 0:58:00

Transformer

  • bs 36 singleGPU/par 3:12:00/3:15:00
  • bs 48 singleGPU/fp/par 3:15:00/6:40:00/2:40:00
  • bs 96 singleGPU/parallel/p+fp n.a./1:40:00/2:20:00

TransformerXL

  • bs 96 singleGPU/parallel/p+fp n.a./1:13:00/

Classifier

AWD_LSTM

  • bs 100 singleGPU/parallel 09:15/5:52
  • bs 136 singleGPU/parallel/fp/p+fp total n.a./5:17/12:39/5:17

Code I used for parralelization is following

learn = cnn_learner(data, models.resnet34, metrics=error_rate)
learn.model = torch.nn.DataParallel(learn.model)

I’m not sure what’s the cause. Maybe it is working correctly (we can see a speed up in language models even though memory is limiting the batch size and thus parallelization speed up). It would be great if someone experienced could interpret the results we have here

0 Likes

(Keshav Unni) #29

I tried this but, in the following line:

learn.lr_find()

I got the following error:

Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’; but device 2 does not equal 0 (while checking arguments for cudnn_batch_norm)

0 Likes