Tabular data is slow and only using 25% of my GPU for training

Hi, I was training the model with the Rossman notebook on my Titan RTX on Ubuntu 19.04 with drivers 418.56 and CUDA 10.
Training was a bit slow, so I looked at GPU usage and it is only using 25%. I can get it to 100% on other training no problem.
PCI bandwidth is at 2%, GPU Memory at 5%, CPU Memory 50% and CPU processing about 22% on each core. My HD is an NVMe 970Pro.
Not sure what could be bottlenecking and not giving enough data for the GPU. Could it be a Python bottleneck? Any ideas?

GPU usage will go up if you increase Batch Size.

1 Like

Thanks! That did it! It was set by default at 64, I had to increase it up to 25600 to get up to 71% of usage.
The training dataset has 900K entries, do you think this could impact the quality of training If I set it this high?

Generally batch size and final performance are unrelated I think (BatchNorm layers would be an exception but they’re beyond the scope of this discussion, and there batch size has a positive impact anyways). Crank it up as much as you can afford!

I actually saw some situations where performance was affected negatively or positively, but I could not tell exactly. I think it was mainly due to how much my data was unbalanced and using a custom loss function.

I’ve been reading a bit more and had to come back to correct my previous statement, batch size does affect generalisation performance! Check out Train longer, generalize better: closing the generalization gap in large batch training of neural networks and generally this thread.
I maintain it’s a more advanced/fine-tuning topic, but thought it was important to correct my previous statement!