Two types of Transfer Learning

mattobrien415 · December 22, 2017, 7:20am

I was reading a PyTorch tutorial on Transfer Learning, where they state:

Two major transfer learning scenarios looks as follows:

1. Finetuning the convnet:
Instead of random initializaion, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset. Rest of the training looks as usual.
2. ConvNet as fixed feature extractor:
Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.

From what I’ve learned, I would have thought that #2 above would have been called Finetuning.
As for #1, it sounds a bit unfamiliar. Would the only reason to use this first type be to speed up learning times by avoiding starting with non-random weights? If I have this correct, then I have to wonder how useful this is.

machinethink · December 22, 2017, 10:34am

There is an “in between” scenario as well where you keep only a certain number of layers fixed and retrain the others. You can even change this during training: for example, start by retraining just the topmost layer, then the two topmost layers, then the three topmost layers, and so on, until you reach the first layer. By that point you’ll be retraining the entire network.

If your training data is similar in nature to the data that the pretrained network was trained on, then just training a new classifier on top of a fixed feature extractor might be good enough (and definitely is the fastest approach). But if the data is quite different – for example the conv net was trained on photos while you’re trying to fine-tune on paintings or illustrations – then it might make sense to tweak the earlier layers too.

gokkulnath · December 22, 2017, 10:42am

Finetuning the convnet: (Similar to Domain Adaptation)

From my Understanding,This is commonly used when the data which you plan to train is different from the one which the Pre trained network was trained. In such Scenario you ideally unfreeze the all the layers and try to tune them according to your desired Data.Subsequently Lower layers will also be modified.

Would the only reason to use this first type be to speed up learning times by avoiding starting with non-random weights?
Ans: Yes. It reduces the time required as it has already been trained for a while on large data and is far far better than random or other initialization because When you train Deeper Networks there are many pitfalls as getting struck in local minima,non convergence… so using Pre trained weights you don’t have to worry much about it.

WaterRocket8236 · December 26, 2017, 6:45am

Yes Finetuning does increase the speed of learning. It is generally required to achieve proper convergence if not achieved before. If features present in the data is similar to the features of theta of already trained CovNet, then freezing the first two layers helps. The reason is because the beginning layers detect the more generalised features and more deep the layers goes, more data specific features it captures. If you training in different data with more/less classes then freeze the entire CovNet apart from the last Softmax layer. Then remove and add a new one. Train the new data on it.

Hafsa · August 23, 2021, 8:28am

I am wondering what is the role of pre-trained weights(like ImageNet) in “ConvNet as fixed feature extractor” approach? Say, I am using resnet-18 (pre-trained on ImageNet), freezing all the convolutional layers in the network and keeping the last fully connected layer which is trained with random weight initialization(as stated in here: Transfer Learning for Computer Vision Tutorial — PyTorch Tutorials 1.9.0+cu102 documentation). Since the convolution layer is freezed, the weights are not being updated. Besides, the last fully connected layer is trained with random weight initialization based on the target dataset. So, in this scenario, what is the role of pre-trained weights from ImageNet?

yash92328 · September 14, 2021, 7:48am

I got this from tensorflow… hope it helps

Feature Extraction: Use the representations learned by a previous network to extract meaningful features from new samples. You simply add a new classifier, which will be trained from scratch, on top of the pretrained model so that you can repurpose the feature maps learned previously for the dataset.You do not need to (re)train the entire model. The base convolutional network already contains features that are generically useful for classifying pictures. However, the final, classification part of the pretrained model is specific to the original classification task, and subsequently specific to the set of classes on which the model was trained.
Fine-Tuning: Unfreeze a few of the top layers of a frozen model base and jointly train both the newly-added classifier layers and the last layers of the base model. This allows us to “fine-tune” the higher-order feature representations in the base model in order to make them more relevant for the specific task.