Yes, it is a single machine with 8 GPUs. That was my initial approach but then I got the following error:
learn = cnn_learner(data=fold_data, base_arch=arch, metrics=[accuracy, auc],
lin_ftrs=[1024,1024], ps=[0.7, 0.7, 0.7],
callbacks=learn_callbacks,
callback_fns=learn_callback_fns)
learn.to_distributed(cuda_id=0)
learn.fit(1)
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
Most of the answers around data parallelism in the forums use nn.DataParallel and couldn’t find a working solution in PyTorch forums as well.
Then regarding to that error message I set the following but it keeps hanging:
os.environ['MASTER_ADDR'] = '127.0.0.1'
os.environ['MASTER_PORT'] = '29500'
os.environ['WORLD_SIZE'] = '4'
os.environ['RANK'] = '0'
torch.distributed.init_process_group(backend='nccl')
This error is not fastai related but there might be someone who faced a similar issue.