Problem in Distributed Training (fastai - V1) for UNet Image Super Resolution

I am basically trying to run the training in lesson-7-superres.ipynb using a python script to perform distributed training as mentioned in the documentation here

I created a training script out of the notebook and added the necessary snippets needed for distributing the training. When I try to execute the python script I am getting the following error

   Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/vision/learner.py", line 116, in unet_learner
    try:    size = data.train_ds[0][0].size
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/data_block.py", line 651, in __getitem__
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/vision/learner.py", line 116, in unet_learner
    if self.item is None: x,y = self.x[idxs],self.y[idxs]
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/data_block.py", line 120, in __getitem__
    try:    size = data.train_ds[0][0].size
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/data_block.py", line 651, in __getitem__
    if isinstance(idxs, Integral): return self.get(idxs)
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/vision/data.py", line 270, in get
    fn = super().get(i)
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/data_block.py", line 75, in get
    if self.item is None: x,y = self.x[idxs],self.y[idxs]    
return self.items[i]  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/data_block.py", line 120, in __getitem__

IndexError: index 0 is out of bounds for axis 0 with size 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "image_super_resolution/SuperRes-SelfAttention.py", line 120, in <module>
    if isinstance(idxs, Integral): return self.get(idxs)
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/vision/data.py", line 270, in get
    blur=True, self_attention=True,norm_type=NormType.Weight)
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/vision/learner.py", line 117, in unet_learner
        fn = super().get(i)except: size = next(iter(data.train_dl))[0].shape[-2:]

  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/data_block.py", line 75, in get
StopIteration
    return self.items[i]
IndexError: index 0 is out of bounds for axis 0 with size 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "image_super_resolution/SuperRes-SelfAttention.py", line 120, in <module>
    blur=True, self_attention=True,norm_type=NormType.Weight)
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/vision/learner.py", line 117, in unet_learner
    except: size = next(iter(data.train_dl))[0].shape[-2:]
StopIteration
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/vision/learner.py", line 116, in unet_learner
    try:    size = data.train_ds[0][0].size
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/data_block.py", line 651, in __getitem__
    if self.item is None: x,y = self.x[idxs],self.y[idxs]
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/data_block.py", line 120, in __getitem__
    if isinstance(idxs, Integral): return self.get(idxs)
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/vision/data.py", line 270, in get
    fn = super().get(i)
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/data_block.py", line 75, in get
    return self.items[i]
IndexError: index 0 is out of bounds for axis 0 with size 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "image_super_resolution/SuperRes-SelfAttention.py", line 120, in <module>
    blur=True, self_attention=True,norm_type=NormType.Weight)
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/fastai/vision/learner.py", line 117, in unet_learner
    except: size = next(iter(data.train_dl))[0].shape[-2:]
StopIteration
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/ec2-user/anaconda3/envs/fastai-1.0.60/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ec2-user/anaconda3/envs/fastai-1.0.60/bin/python', '-u', 'image_super_resolution/SuperRes-SelfAttention.py', '--local_rank=2']' returned non-zero exit status 1.

any help?

Hi @arul_bharathi were you able to figure out how to use distributed training here ?