I have been playing with pretraining a Generator superresolution model (to then use into a GAN), that takes images of resolution 64x64 and upscale them by a factor x8 to 512x512.
Training on the DIV2K dataset was taking about 3.5 minutes per epoch, where my inputs to the network where the LowRes images downscaled from the HR images.
I tried to reduce the input images(patches) to 8x8, so reduced the inputs spatial size by a factor x64 (note that these are all precomputed on my SSD drive), and increased the batch_size by a factor x64 to keep using at maximum my GPU memory.
However the training time has exploded to 2.5 Hours per epoch. Why?
I was expecting some overhead due to loading from the SSD more images of size 8x8, whilst before was only one of size 64x64 but should not the multithreads workers of the dataloaders object hide this latency by prefechting the data?