The Learner.distrib_ctx doesn't work on v2.7.11

Hi,

I am trying to run fastai trainer on multiple (x4) GPUs on the same machine. For this, I use the distrib_ctx context manager.

with trainer.distrib_ctx():
    trainer.fine_tune(
        args.epochs, 
        base_lr=args.base_lr, 
        freeze_epochs=args.freeze_epochs
    )

I’m launching the script using accelerate.

accelerate launch train.py

Also, I’m using the following accelerate configuration.

compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: all
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
use_cpu: false

Unfortunately, the script doesn’t work.

AttributeError: 'Sequential' object has no attribute 'distrib_ctx'

I’ve created the vision_learner and it works fine on a single GPU. However, as soon as I try to start it on multiple GPUs, it fails.

Could you please give me some tips about multi-GPU training in fastai library? Would be great if I can keep training in fastai, and not switch to another trainer to scale up.

What is your experience? How do you train Learner in case of multiple GPUs? Also, the version I use is 2.7.11.

I’m going to need a whole heck lot more information :slight_smile: What are your imports like? What’s your learner like? Are you doing from fastai.distributed import *? Please try include as much information as possible for me to help you as I currently can’t provide anything truly helpful yet

1 Like