I am playing around with the batch_size parameter of torch.utils.data.DataLoader and notice that as the batch_size value is increased, the model training gets noticeably faster. The model is trained across 3 GPU devices.
For example, using a batch_size of 10000 reduces the training time by more than half compared to using a batch_size of 100. Even with 100000, only about 10% of the GPU memory was used.
Will having a large batch_size value affect the model performance at the expense of training speed? How do you decide how large to increase batch_size to?