Why would you want to do this on the GPU? If your GPU is busy doing data augmentation it won’t be able to do any training at the same time.
So now your GPU needs to wait for the CPU to send it the images (which are fairly large so you can only send a small batch at a time), then augment and resize the images, then train on this small batch of images, then send the results back to the CPU and wait for the CPU to deliver the next batch of images.
It is quicker to have the CPU already do the data augmentation while the GPU is busy training. That way these two tasks happen in parallel. As long as the time it takes the CPU to do the augmentation and the time it takes the GPU to do the training, are roughly equal, then they don’t have to wait on each other.