How to train with large images and 16gb memory?

krisb · June 10, 2020, 7:07pm

I am training an image-to-image GAN to colorize photos. I am using a free GPU on the Gradient platform, which automatically shuts down after six hours. It has 16gb of memory. I am using a dataset containing more than 100k images. It seems the largest image size I can train on within six hours is 1 epoch of 320x320 photos. I would like to train my dataset on larger photos because I think it would be more useful for inference. To train on 384x384 photos, the batch size has to be so small that training takes about eight hours.

What can I do?

rsomani95 · June 11, 2020, 12:39am

What’s your batch size?

Training time is affected a lot more by the image size than by batch size. If you move from 320x320 images to 384x384, your model will definitely take longer to train.

If your batch size is too small, you could accumulate gradients i.e. do a gradient update every nth batch rather than every batch. I know the callback exists for v1, not sure about v2 yet.

Pomo · June 11, 2020, 12:55am

Another option: $8/month.

krisb · June 11, 2020, 1:33am

I’m paying the $8 a month to store the data on there. To pay for GPU usage would cost way too much by the time I finished training the model, unfortunately.

I’ll look into accumulating gradients. Thanks. That’s a good idea.

muellerzr · June 11, 2020, 1:21pm

It exists in v2 as well

rsomani95 · June 11, 2020, 5:01pm

Ah cool, I think it isn’t documented on the website yet then

muellerzr · June 11, 2020, 5:02pm

Right here (It’s called a little different as there’s no space so it can be tricky to find)

https://dev.fast.ai/callback.training#GradientAccumulation

rsomani95 · June 11, 2020, 5:02pm

hahaha, I’m lazy and it didn’t show up in the search bar so I jumped to conclusions