GPU Utilization jumps to zero often

viraat · October 25, 2018, 10:00am

I’m trying out the new v1 library on this places dataset.

The images are not very big (they’re not all the same size but less than 1000x1000). My GPU utilization keeps fluctuating between 0 and 99. I assume that this is because the CPUs are a bottleneck. I have tried using fewer image transforms by using

get_transforms(do_flip=False, max_lighting=None, max_rotate=None, max_warp=None)

but still see the fluctuating GPU utilization. If it’s not the transforms causing the bottleneck, it might be the resize operations.

IIRC, in fastai v0.7 there was an option to resize images beforehand once and use those instead of resizing. I think this might be a solution, but couldn’t find an easy way to do this in the v1 library. Is there a way to do this?

gsg · October 25, 2018, 3:44pm

You can always use, beforehand

from PIL import Image
ns = 200
Image.Open(filename).resize((ns, ns))

where ns is the desired new size, then save the images, etc.

jeremy · October 26, 2018, 1:17am

What platform are you on? It may be you need an optimized jpeg lib.

viraat · October 26, 2018, 10:26am

I’m on GCP’s Deep learning VM with an Intel CPU (8 cores) and a Nvidia GPU.

I tried to drop in Pillow-SIMD instead of Pillow but it broke something, so I reverted to standard Pillow.

I will try and see if I can replicate behaviour with other datasets.

shub.chat · November 7, 2018, 8:02pm

@viraat How did you solve for this? I am also facing a similar problem. My GPU utilization switches between 0 and 50% but also my CPU utilization is~10-15% not sure where the problem is. Is this in the data loader or some other internal image aug?

viraat · November 8, 2018, 3:25pm

You could try increasing the num_workers to 16 when you’re creating a DataBunch object. I would also try increasing the batch_size.

Let me know if that helps.

shub.chat · November 8, 2018, 7:26pm

My system has 8 CPU’s.Is there any reason you think increasing it to 16 will be better?

viraat · November 9, 2018, 10:52am

It’s a general rule of thumb that I follow to use 2 x the number of CPU cores. If you have more threads doing the job of fetching data it should help.

If your images are large (> 500x500) I would consider resizing your images to something reasonable and storing them on disk and using those resized ones instead

shub.chat · November 9, 2018, 7:47pm

Thanks Viraat ! This did help another of my observation is my volatile GPU utilization fluctuates a lot between 0% and 80% so I believe data-loader is a bottle neck I guess trying to optimize data-loading to GPU (by storing the resized images and may be not augmenting) may lead to better GPU utilization

Preka · January 30, 2019, 10:30am

What command did you use here?

PierreO · January 30, 2019, 12:19pm

You can type nvidia-smi on the command line to see GPU stats like temperature, memory utilization, power consumption and others. To have one that refresh every 2 seconds, you can type watch nvidia-smi.

Preka · January 30, 2019, 12:34pm

Ah, great! Yeah, I already knew the nvidia-smi, but no watch nvidia-smi.

Thanks a lot for your prompt reply :).

PierreO · January 30, 2019, 1:45pm

You’re welcome You can use watch with other commands too, it’s a bash command.

Preka · January 30, 2019, 2:05pm

Ahhhhhhh, didn’t notice it . Thanks again.