"RuntimeError: CUDA error: out of memory" on google cloud platform

dag · October 29, 2018, 12:23am

When I’m running the following code on lesson1-pets using resnet50,

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=320, bs=bs//2)
data.normalize(imagenet_stats)
learn = ConvLearner(data, models.resnet50, metrics=error_rate)
learn.fit_one_cycle(8, max_lr=slice(1e-3))

I get the error: “RuntimeError: CUDA error: out of memory”

I have tried decreasing size=100 and bs=10, but I still get the same error. I’ve restarted the Jupyter notebook kernel and Google Cloud Platform.

I believe I’m on Google Cloud Platform as there’s a green check mark next to my-fastai-instance on GCP and a green jupyter@my-fastai-instance on my terminal command line. This is the link to the my notebook.
http://localhost:8080/tree/tutorials/fastai/course-v3/nbs/dl1#notebooks

What could still be the issue?

Thanks much!

willismar · November 27, 2018, 12:22am

Hi… sorry to be late to see this…

Actually when you get this error is because the model and the data together tried to use more memory then the sum of the memory available on your GPUs.

If you have more than a GPU on your instance you can distribute the workload between then by wrapping your model with:

if torch.cuda.device_count() > 1:
    learn.model = nn.DataParallel( learn.model )

If even so you did not get your model to work then you may need to reduce your model and batch until it can fill your memory easy. This is recurrent problem. A bigger model always can impact on GPU memory.