Different batch_size for train and valid data loaders

Yes, you don’t compute the gradients for the validation set, so you have more GPU memory available. Usually it’s even bs*2 vor the validation set.

2 Likes