Using Large batch_size for DataLoader?

I am playing around with the batch_size parameter of torch.utils.data.DataLoader and notice that as the batch_size value is increased, the model training gets noticeably faster. The model is trained across 3 GPU devices.

For example, using a batch_size of 10000 reduces the training time by more than half compared to using a batch_size of 100. Even with 100000, only about 10% of the GPU memory was used.

Will having a large batch_size value affect the model performance at the expense of training speed? How do you decide how large to increase batch_size to?

Thank you

Hey Nyx - I was having similar questions recently and found some good insight from the Leslie Smith hyper-parameters paper.

You can find a link to the paper and some discussion around it here:

Best of luck