In each epoch, is it necessary to go through every sample in the training data at least once?

hiydavid · August 13, 2021, 5:16pm

Question about training epoch in general. I’m trying to create a custom training loop from scratch, and have a question about epochs.

Let’s say I have a function that randomly extracts a number of batch size from training data. Something like this:

def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

Let’s say I will train for 50 epochs, and within each epoch, I will use the above function on the dataset 1000 times because training set is 32,000 samples.

My question is, for each epoch, is it necessary to ensure that EVERY sample in the training data gets through the network once? Or is it okay if I just randomly select from 32 samples. In other words, do I need an additional step in the code where I drop the 32 samples from the training data after having them gone through the network, so that in future steps those samples won’t be learned again. This is to ensure every sample is learned once. Or is this not necessary?

Thanks and sorry for the newbie question!

GoofyMango · August 13, 2021, 7:36pm

Hi, that’s a good question! It’s definitely not necessary to use all the samples each epoch, but my guess is that it would probably be better.

I would expect using all the examples once per epoch would be better because then you’re making use of all the different data examples you have. If you randomly selected examples each step without dropping them for future steps, you likely would be training on some examples multiple times and other examples not at all.

This would be cool to explore though! That’d be cool to try out both ways and see how it affects your model’s performance.

hiydavid · August 16, 2021, 4:47pm

Thanks. I will definitely try both. Upon further research it seems like someone had already studied this questions and came to this conclusion:

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In practice, however, sampling without replacement is very common, easier to implement in many cases, and often performs better.