Finally, let’s make this concrete with a small example.

Assume you have a dataset with 200 samples (rows of data) and you choose a batch size of 5 and 1,000 epochs.

This means that the dataset will be divided into 40 batches, each with five samples. The model weights will be updated after each batch of five samples.

This also means that one epoch will involve 40 batches or 40 updates to the model.

With 1,000 epochs, the model will be exposed to or pass through the whole dataset 1,000 times. That is a total of 40,000 batches during the entire training process.

Even I am slightly confused with the concept of epoch. Say that for the first time I train my network with 20 epochs, and I find out that at 8th epoch I get the lowest error, should I retrain my network with 8 epochs? And should that be done after unfreezing or without unfreezing?

Please suggest as I am slightly confused with the steps.

Hey if you’re error gets consisently worse after the 8th epoch, then yes you overfit. So you should stop somewhere around the 8th epoch.
When you freeze your network and train, only the last two layers get trained. When you unfreeze all of them get trained, So just train a few epochs when with freezing. Then unfreeze and train some more as long as your loss keeps decreasing