Yes, you don’t compute the gradients for the validation set, so you have more GPU memory available. Usually it’s even bs*2 vor the validation set.
Yes, you don’t compute the gradients for the validation set, so you have more GPU memory available. Usually it’s even bs*2 vor the validation set.