Fine-tuning model: why do we need to save and load the same model before unfreeze

in the Lesson 4, I saw a lot:


  1. Why do we need to load the model immediately after we saved the model?
  2. Why do we need to unfreeze the model to train for the 1st time fine-tuning but changed to freeze the model on the 2nd time fine-tuning?

Could someone explain and give me some guideline about the procedure of fine-tuning models?


The model is saved in case we have to back up to an earlier trained model. Suppose you choose a learning rate that is too high, the model does not converge, and the loss blows up. It is nearly impossible to repair a damaged model. If you have saved the previous version, you can revert to it and try again. In practice, the code to revert to each version is already there in the notebook. It has no effect if you simply run straight through line by line.

Why do we need to unfreeze the model to train for the 1st time fine-tuning but changed to freeze the model on the 2nd time fine-tuning?

freeze_to freezes the specified layer groups and unfreezes all the others. It is used to progressively unfreeze the later layers. unfreeze() unfreezes all layers. Please see the docs to understand how exactly how these functions work. The reasons for progressive unfreezing of a pretrained model are explained in the video lessons.

It may help you to understand that he video lessons demonstrate an effective training method for a class of example problems. They show good methods as a starting point to try and to learn from. In general, training a model is not an exact science, with specific rules and reasons at each step. Rather, during training, you may try unfreezing/freezing certain layers, varying the learning rate and momentum, etc. in order to improve training rate and accuracy. With time and experimentation, you will develop your own intuitions and best practices.

HTH, Malcolm

First of all, thanks for helping me in many posts :slight_smile:

Okay, back to my question:

I was asking why do we need to load the trained model? You answered why do we need to save it.

The rest questions have been answered, once again thank you for your help :+1:

In the case of language models this can be because we will run out of memory unless we have a very large GPU (even with image models) and so we may need to restart our kernels. This way we save our work and can load back in easily

thanks @muellerzr

So may I conclude that once we load the trained model, it free up the previously occupied resources e.g. memory. So that’s the reason why we load trained model after immediately saving model?

Also, can we use this approach for training images?

If we restart the instance or deal with the memory in some way (learn.destroy(), etc) then yes :slight_smile:

And absolutely! Any model if you notice it takes up too much ram (see the segmentation model example)

but it does nothing with loading the trained model.
So I think I finally get the point.

We first 1. save model -> 2. restart kernel/instance -> 3. load model.

Step cannot be seen in code so that’s why I only saw…)

but in-between these 2 operations, if necessary, the kernel/instance has been restarted.

Am I correct?


Perfectly correct :slight_smile:

oh my god, I finally get something correct :rofl:

I am so happy now~! hahahaha

thank you ~!!!