I am playing around with the batch_size parameter of torch.utils.data.DataLoader and notice that as the batch_size value is increased, the model training gets noticeably faster. The model is trained across 3 GPU devices. For example, using a batch_size of 10000 reduces the training time by more tha…

Using Large batch_size for DataLoader?

mcclomitz (Kieran) February 7, 2020, 6:50pm 2

Hey Nyx - I was having similar questions recently and found some good insight from the Leslie Smith hyper-parameters paper.

You can find a link to the paper and some discussion around it here:

Best of luck