Hi,
I am trying to run fastai
trainer on multiple (x4) GPUs on the same machine. For this, I use the distrib_ctx
context manager.
with trainer.distrib_ctx():
trainer.fine_tune(
args.epochs,
base_lr=args.base_lr,
freeze_epochs=args.freeze_epochs
)
I’m launching the script using accelerate
.
accelerate launch train.py
Also, I’m using the following accelerate
configuration.
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: all
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
use_cpu: false
Unfortunately, the script doesn’t work.
AttributeError: 'Sequential' object has no attribute 'distrib_ctx'
I’ve created the vision_learner
and it works fine on a single GPU. However, as soon as I try to start it on multiple GPUs, it fails.
Could you please give me some tips about multi-GPU training in fastai
library? Would be great if I can keep training in fastai
, and not switch to another trainer to scale up.
What is your experience? How do you train Learner
in case of multiple GPUs? Also, the version I use is 2.7.11
.