I am using 200GB persistent disk on GCP. Each epoch is taking 2.5hrs to train.
My GPU usage is intermittent (because data pre-processing is on the fly) but touches 100% atleast once for every batch. And I am using 16CPUs. So, clearly the bottleneck is IO.
So i want to increase the size and set the type of disk such that the latency is low. How should the size (and type) be decided wrt data size?