Hello, I am having I big struggle to get training going on 2 GPUs, when running code from jupyter notebooks.
Tried 2 approaches:
learn = language_model_learner(
dls, AWD_LSTM, drop_mult=.3,
metrics=[accuracy]
).to_fp16()
learn.model = nn.DataParallel(learn.model)
learn.fine_tune(1, 3e-2)
and
# Get number of GPUs
gpu = None
if torch.cuda.is_available():
if gpu is not None: torch.cuda.set_device(gpu)
n_gpu = torch.cuda.device_count()
else:
n_gpu = None
# Get learner
learn = language_model_learner(
dls, AWD_LSTM, drop_mult=.3,
metrics=[accuracy]
).to_fp16()
# The context manager way of dp/ddp, both can handle single GPU base case.
if gpu is None and n_gpu is not None:
ctx = learn.parallel_ctx
with partial(ctx, gpu)():
print(f"Training in {ctx.__name__} context on GPU {list(range(n_gpu))}")
learn.fine_tune(2)
else:
learn.fine_tune(2)
Both approaches raises a RuntimeError
RuntimeError: Input and hidden tensors are not at the same device, found input tensor at cuda:1 and hidden tensor at cuda:0
The only way I managed to get it running is by executing:
with learn.distrib_ctx():
learn.fine_tune(1)
as python script python -m fastai.launch train.py
Can someone help me with this?