How to use multiple gpus

#21

Your first line is incorrect, it should be:

learn.model = torch.nn.DataParallel(learn.model, device_ids=[0, 1])
0 Likes

#22

Thanks for your reply. this works now but I cannot save the model that is being trained via DataParallel. what am I doing wrong here? :

mName_S = str('bestModel_+ aName)
learnS = create_cnn(data, arch,
metrics=[accuracy, error_rate],
callback_fns=[partial(CSVLogger, filename =str(‘stat_’ +str(tr)+‘S’+ aName)), ShowGraph,
partial(SaveModelCallback, monitor =‘val_loss’, mode =‘auto’, name = mName_S )])

learnS.model=torch.nn.DataParallel(learnS.model, device_ids=[0,1])

learnS.load(mName_S)
log_preds, y_true = learnS.TTA()
y_true = y_true.numpy()
y_preds = np.argmax(np.exp(log_preds), axis=1)

What I ideally want to do is to train a model on multiple GPU, then save it, and later on be able to load it and predict some data via TTA.

0 Likes

#23

We can’t help without seeing the full error message and your version of fastai.

0 Likes

#24

hi @sgugger

this is the error it generates:

Traceback (most recent call last):
File “GpyCode.py”, line 85, in
learnS.load(m)
File “/homes/…/python3.6/site-packages/fastai/basic_train.py”, line 217, in load
state = torch.load(self.path/self.model_dir/f’{name}.pth’, map_location=device)
File “/homes/…/python3.6/site-packages/torch/serialization.py”, line 365, in load
f = open(f, ‘rb’)
FileNotFoundError: [Errno 2] No such file or directory: ‘data/LC_B_5/models/bestModel_5_S__resnet101.pth’

it says it cannot load the trained model, however this line of code :

partial(SaveModelCallback, monitor =‘val_loss’, mode =‘auto’, name = mName_S )])

which is responsible for saving the trained model is not saving the model neither generates any error message while in single GPU and multi CPU mode it saves the trained model perfectly and therefore load it without problem.

Thanks

0 Likes

(Karl) #25

I’m looking into training a language model on a fairly large dataset and I’m hoping to use multiple GPUs. Is it still the case that RNNs don’t work with multi-GPU?

0 Likes

#26

No it works properly. It’s the SaveModelCallback that might have some problem.

1 Like

(Sooraj Mangalath Subrahmannian) #27

Hi
I am training ULMFiT on multiple GPUs. I am facing this issue that my GPUs usage is heavily imbalanced.

.

System configurations:
Used fastai and torch version:
1.0.51.dev0 1.0.1.post2

Here is the code:
save_model = partial(SaveModelCallback,
monitor=‘accuracy’,
every=‘improvement’,
name=‘best_lm’)

early_stop = partial(EarlyStoppingCallback,
monitor=‘val_loss’,
min_delta=0.01,
patience=2)

lm_learner = language_model_learner(data_lm, ARCHITECTURE,
drop_mult = DROP_OUT_MULTIPLIER,
callback_fns = [early_stop, save_model])
lm_learner= lm_learner.to_parallel()
lm_learner.freeze_to(-1)
lm_learner.fit_one_cycle(cyc_len = 1,
max_lr = 0.04,
moms = (0.8, 0.7))

0 Likes