Lesson 4: Why does the validation set need a DataLoader (mini batches)?

sambit · September 1, 2020, 10:51am

This is in reference to ‘my_04_mnist_basics.ipynb’.

I understand that the training set needs a DataLoader (for selecting mini-batches during training).

But the validation set is only used to calculate metrics. So why does it need mini-batches & a DataLoader?

Aren’t the metrics computed on the entire validation set?

PalaashAgrawal · September 1, 2020, 12:58pm

I think you’re misunderstanding how the model works.
The idea behind minibatches is to simply to break the data into chunks, because the CPU/GPU cant take all of the data at once (Because the data that we deal with is usually very large). However, it doesnt mean that we evaluate only part of the data - we do evaluate the entire (validation) set.
During validation, we feed the input data to the model, and get a set of output, yes? And then compare with the ground truth with the predictions and get the metric values (eg accuracy). Since you cant feed all the validation data into the model at once, we feed them in minibatches, and then later do an average (a weighted average to be more precise - where the weights are the size of the minibatch, which may not be same for all the batches).

Hope this helps
Cheers

sambit · September 1, 2020, 1:13pm

Ah ok. I understand now. Cheers.