To_parallel, to_distributed, and PeakMemMetric

Hi, I’m curious about the usage of these new callbacks as well as their interaction with PeakMemMetric.

I’ve first tried just calling learn.to_parallel() on a ml.p3.8xlarge EC2 instance (4 V100’s). I assumed that I should multiply my batch size by number of GPUs, but this resulted in a CUDA OOM crash (original bs worked with 1 GPU). I then tested with/without to_parallel with the same batch size and ended up with the same epoch times and PeakMemMetric results. I’m running this through SageMaker so it’s a bit difficult to query the GPUs with nvidia-smi, but it seems that there might be only 1 active. Anybody have thoughts/suggestions?

Also, what’s the expected behavior of PeakMemMetric when using parallel or distributed training?

Edit: btw, testing with a unet_learner.

unet_learner doesn’t work with DataParallel (as documented here), you need to use a script and distributed.

1 Like

Ah, missed that note. Thanks!

learn.to_parallel()

AttributeError Traceback (most recent call last)
in
----> 1 learn.to_parallel()

AttributeError: ‘Learner’ object has no attribute ‘to_parallel’

fastai 1.0.51 shows this error, any suggestions?

I am using resnet50

Did you import everything from fastai.distributed?

2 Likes

thank you !
after importing fastai.distributed its working now. :slight_smile:

I came across an issue with to_distributed with unet_learner given arbitrary input sizes for the image, the computation takes much longer than if provided fixed image size, is this a common behaviour for the module?