I’ve noticed that whenever you train for a single epoch, several things happen underneath the cell.
- Training runs for a single epoch to fit / connect the head of the pre-trained model to our new random head for the purposes of whatever we’re doing
- Then it trains again for a single epoch (as specified), updating as per our training data.
But for each epoch, it seems like there are two separate stages. One is slow (what I think of in my head as the ‘real training’) where the progress bar completes one cycle of 0-100% and then it again goes through the progress bar from 0-100% quite a bit faster.
What are those two separate processes going on underneath? (Possible we might find out about those at a later stage, in which case feel free to tell me just to wait until a later lesson
) Should I think of them as two separate processes? Is it some kind of consolidation or calculation that’s being represented there?