Hi, I have a dataset containing about 22000 images. I have set the iterations for the training to be 75. In each iteration, the algorithm selects about 4400 images for training. My question is, how is this data selected? And does the fact that we are training the model on each image more than once, affect our model’s reliability?
What do you mean by iterations? That is not a term commonly used in deep learning, or at least not in the way you seem to be using it. You don’t typically set iterations in deep learning. Perhaps check out https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9 for clarification.
It might also help to show at least the basic code you use to better see what you are doing, in particular how you split the dataset and how you call ’
As to your question it is perfectly normal to train a network on each image multiple times, perhaps a great many times. Though this can lead to issues with overfitting, where the network just ‘remembers’ the ouptut for particular inputs rather than learning generalisable knowledge, there are various techniques to overcome this like data augmentation and dropout.
Thanks for the reply. I have attached a screenshot of my code. I guess according to the website you included, the term related to my work should be “epochs”, as Jeremy also uses this word for training the model.
My question is, how does the network decide to pick the images in each epoch?
OK, I see there you were getting the 4400 number from, that is what I wasn’t sure of. That 4400 is not the number of images presented per epoch (or iteration). That is the number in your validation set. Each epoch involves putting all images in both the train and validation set through the model, there is no selection of images for each epoch.
Each epoch the model will first be run on all images in your training set, so those 17773 images will be processed 4 at a time, 4 being your batch size. During this the model weights are updated after each batch. Then after running through the entire training set the validation set will be processed, though this time the weights will not be updated. This is just done to show you the validation loss so you can see how training is going, this has no impact on the model. This process of doing the whole training set then the whole validation set, which is a single epoch, will be repeated 75 times. The order of images will be randomised each epoch but every image is used in every epoch (by default anyway, there is support for changing the sampling process through options to databunch, or through callbacks such as the OverSamplingCallback).
So there isn’t any deciding which images are in an epoch, all images are processed every epoch.
Thank you so much. Everything became clear with your thorough explanation.