How to use multiple gpus


Your first line is incorrect, it should be:

learn.model = torch.nn.DataParallel(learn.model, device_ids=[0, 1])


Thanks for your reply. this works now but I cannot save the model that is being trained via DataParallel. what am I doing wrong here? :

mName_S = str('bestModel_+ aName)
learnS = create_cnn(data, arch,
metrics=[accuracy, error_rate],
callback_fns=[partial(CSVLogger, filename =str(‘stat_’ +str(tr)+‘S’+ aName)), ShowGraph,
partial(SaveModelCallback, monitor =‘val_loss’, mode =‘auto’, name = mName_S )])

learnS.model=torch.nn.DataParallel(learnS.model, device_ids=[0,1])

log_preds, y_true = learnS.TTA()
y_true = y_true.numpy()
y_preds = np.argmax(np.exp(log_preds), axis=1)

What I ideally want to do is to train a model on multiple GPU, then save it, and later on be able to load it and predict some data via TTA.


We can’t help without seeing the full error message and your version of fastai.


hi @sgugger

this is the error it generates:

Traceback (most recent call last):
File “”, line 85, in
File “/homes/…/python3.6/site-packages/fastai/”, line 217, in load
state = torch.load(self.path/self.model_dir/f’{name}.pth’, map_location=device)
File “/homes/…/python3.6/site-packages/torch/”, line 365, in load
f = open(f, ‘rb’)
FileNotFoundError: [Errno 2] No such file or directory: ‘data/LC_B_5/models/bestModel_5_S__resnet101.pth’

it says it cannot load the trained model, however this line of code :

partial(SaveModelCallback, monitor =‘val_loss’, mode =‘auto’, name = mName_S )])

which is responsible for saving the trained model is not saving the model neither generates any error message while in single GPU and multi CPU mode it saves the trained model perfectly and therefore load it without problem.