Multi-GPU training in fastai library

Hi,

I am trying to perform classification on pretty high resolution images (< 3000 x 2000). I have been able to achieve good results on downsampled versions of the same images and there’s a clear correlation between the size of the images used for training vs the accuracy I am able to achieve.

Hence, I’m trying to train models on > 1024x1024 resolution images (more than 50000 of them) and it looks like the training takes quite a long time. More than 7 hours on a p3 machine. So, I’m exploring the option of using multiple gpus to accelerate the training process and I wanted to see if fastai library supports DataParallel approach? Or should I just fall back to pytorch DDP for doing this? The cifar-darknet notebook (lesson-12) contains the DataParallel piece. If this works, are there any specific dependencies we have to install apart from NCCL? I’d like to create some documentation around this if that helps the community to deal with such kinds of datasets.

Really appreciate your time here.

Thanks.

3 Likes

I tried training on multiple GPUs only once and for me it has been as simple as wrapping the model in nn.DataParallel. I didn’t have to change anything else.

If you do end up training on such high resolution images, could you please share if the correlation between image size and model accuracy continued to hold and what was the max size you trained on?

2 Likes

Thanks for your views @radek. If it is just wrapping the model object in nn.DataParallel let me try that by spawning a GPU and come back to you. I assumed you would have to install a few other libraries.

The original images are quite big; greater than 2048x2048. I started with 96x96, 128x128, 256 and now I am at 512. The training time exponentially grew from a few minutes to five and a half hours for a 30-length 1-cycle policy on 80% of the 50k images. As I was gradually increasing the size, the accuracy did go up indicating a positive correlation between size and potential accuracy.

Your words give me hope and I’ll report my findings here in a day. Thank you @radek.

1 Like

@binga I have one more observation to add.Whenever I change the size of input image, at the step of unfreezing the network the overall memory occupied has increased in non linear fashion.Ideally I thought the memory it should occupy is equal to - (batch size × mem of each image)+ memory to store the weights of network. In the the equation above I thought (batch size×mem) of each image will change when I use larger image size on same network.But it is not the case in my experiments , overall memory it occupied didn’t follow the equation I thought .Did you also face the same issue or am I missing something.

Hey @radek, I am able to train the model on multigpu when I wrap the model object in nn.DataParallel and use the ConvLearner.from_model_data method.

However, if I’d like to do an epoch of pretraining and then unfreeze, I tried using the same wrapped model object in ConvLearner.pretrained and it bombed. It says

TypeError: Broadcast function not implemented for CPU tensors

Have you tried pretraining and unfreezing layer by layer?

Are you seeing higher memory usage?

When you unfreeze the network, you are interested in updating the weights of all the layers. Which also means that your model should remember the gradients for each layer. So, you would most likely need almost double the size now IMHO.

@binga I understand that network should remember the weights of each layer and this should be independent of batch size and dependant on network weights size. To take an example : Let us say with a batch size of 4 at unfreezing moment- memory consumption is - 4x(x memory req for each image) + y(memory required for storing all gradients) . Now if I increase my batch size to 12 , I expected memory occupancy to be (12x + y). But it is 12x+p. And this p is way higher than y we had. Not sure why this is the case.

Hi @binga - sorry, I have not tried freezing / unfreezing with multiple GPUs.

That’s alright. Thanks for all the help @radek.

Sorry, I do not know why @geetha.ai.

Some of the observations here could be helpful - https://medium.com/syncedreview/how-to-train-a-very-large-and-deep-model-on-one-gpu-7b7edfe2d072

1 Like