The model is saved in case we have to back up to an earlier trained model. Suppose you choose a learning rate that is too high, the model does not converge, and the loss blows up. It is nearly impossible to repair a damaged model. If you have saved the previous version, you can revert to it and try again. In practice, the code to revert to each version is already there in the notebook. It has no effect if you simply run straight through line by line.
Why do we need to unfreeze the model to train for the 1st time fine-tuning but changed to freeze the model on the 2nd time fine-tuning?
freeze_to freezes the specified layer groups and unfreezes all the others. It is used to progressively unfreeze the later layers. unfreeze() unfreezes all layers. Please see the docs to understand how exactly how these functions work. The reasons for progressive unfreezing of a pretrained model are explained in the video lessons.
It may help you to understand that he video lessons demonstrate an effective training method for a class of example problems. They show good methods as a starting point to try and to learn from. In general, training a model is not an exact science, with specific rules and reasons at each step. Rather, during training, you may try unfreezing/freezing certain layers, varying the learning rate and momentum, etc. in order to improve training rate and accuracy. With time and experimentation, you will develop your own intuitions and best practices.
In the case of language models this can be because we will run out of memory unless we have a very large GPU (even with image models) and so we may need to restart our kernels. This way we save our work and can load back in easily
So may I conclude that once we load the trained model, it free up the previously occupied resources e.g. memory. So that’s the reason why we load trained model after immediately saving model?
Also, can we use this approach for training images?