I am training an image-to-image GAN to colorize photos. I am using a free GPU on the Gradient platform, which automatically shuts down after six hours. It has 16gb of memory. I am using a dataset containing more than 100k images. It seems the largest image size I can train on within six hours is 1 epoch of 320x320 photos. I would like to train my dataset on larger photos because I think it would be more useful for inference. To train on 384x384 photos, the batch size has to be so small that training takes about eight hours.
Training time is affected a lot more by the image size than by batch size. If you move from 320x320 images to 384x384, your model will definitely take longer to train.
If your batch size is too small, you could accumulate gradients i.e. do a gradient update every nth batch rather than every batch. I know the callback exists for v1, not sure about v2 yet.
I’m paying the $8 a month to store the data on there. To pay for GPU usage would cost way too much by the time I finished training the model, unfortunately.
I’ll look into accumulating gradients. Thanks. That’s a good idea.