You mention that you can see 2 sets of training data. I guess these are the ones you refer to as epoc 0 (2x). It is not clear to me why this separation is there.
When you use .fine_tune, you actually do the following:
one epoch of training where the pretrained model is frozen and only the last layers of the model ate trained, i.e. only the weights and biases of the last layers are modified
n epochs of training on the full model, i.e. all parameters (w and b) are updated.
This is why you see two sets of epoch
If you check the doc for .fine_tune you will see that you also can define how many epochs to use for the first step.
This is the basic principles of transfer learning.