AttributeError: 'Sequential' object has no attribute 'distrib_ctx'
I’ve created the vision_learner and it works fine on a single GPU. However, as soon as I try to start it on multiple GPUs, it fails.
Could you please give me some tips about multi-GPU training in fastai library? Would be great if I can keep training in fastai, and not switch to another trainer to scale up.
What is your experience? How do you train Learner in case of multiple GPUs? Also, the version I use is 2.7.11.
I’m going to need a whole heck lot more information What are your imports like? What’s your learner like? Are you doing from fastai.distributed import *? Please try include as much information as possible for me to help you as I currently can’t provide anything truly helpful yet
Hi @muellerzr I was trying to run my code on multi gpu on my local machine so the python file (without notebook launcher part) is running fine but there is issue while I am trying to run the code in jupyter notebook. I have added the code for reference and the issue with it. Also I am using fastai 2.7.12
ValueError Traceback (most recent call last)
Cell In[7], line 4
2 with learn.distrib_ctx():
3 learn.fine_tune(4)
----> 4 notebook_launcher(train, num_processes=2)
File ~/fastai_projects/fastai_env/lib/python3.10/site-packages/accelerate/launchers.py:123, in notebook_launcher(function, args, num_processes, mixed_precision, use_port)
116 raise ValueError(
117 "To launch a multi-GPU training from your notebook, the `Accelerator` should only be initialized "
118 "inside your training function. Restart your notebook and make sure no cells initializes an "
119 "`Accelerator`."
120 )
122 if torch.cuda.is_initialized():
--> 123 raise ValueError(
124 "To launch a multi-GPU training from your notebook, you need to avoid running any instruction "
125 "using `torch.cuda` in any cell. Restart your notebook and make sure no cells use any CUDA "
126 "function."
127 )
129 # torch.distributed will expect a few environment variable to be here. We set the ones common to each
130 # process here (the other ones will be set be the launcher).
131 with patch_environment(
132 world_size=num_processes, master_addr="127.0.01", master_port=use_port, mixed_precision=mixed_precision
133 ):
ValueError: To launch a multi-GPU training from your notebook, you need to avoid running any instruction using `torch.cuda` in any cell. Restart your notebook and make sure no cells use any CUDA function.
The error is quite clear. You can’t run any code that initializes cuda. Please see the docs for examples of how to do so, you’ll notice that we don’t create the learner or dataloaders because this will initialize CUDA: fastai - Notebook Launcher examples
Thanks that worked like a charm so everything that has possibility of triggering .to(‘cuda’) have to be shifted in a complete new function which has to be used via notebook launcher