Two types of Transfer Learning

(mattobrien415) #1

I was reading a PyTorch tutorial on Transfer Learning, where they state:

Two major transfer learning scenarios looks as follows:

1. Finetuning the convnet:
Instead of random initializaion, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset. Rest of the training looks as usual.
2. ConvNet as fixed feature extractor:
Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.

From what I’ve learned, I would have thought that #2 above would have been called Finetuning.
As for #1, it sounds a bit unfamiliar. Would the only reason to use this first type be to speed up learning times by avoiding starting with non-random weights? If I have this correct, then I have to wonder how useful this is.

(Matthijs) #2

There is an “in between” scenario as well where you keep only a certain number of layers fixed and retrain the others. You can even change this during training: for example, start by retraining just the topmost layer, then the two topmost layers, then the three topmost layers, and so on, until you reach the first layer. By that point you’ll be retraining the entire network.

If your training data is similar in nature to the data that the pretrained network was trained on, then just training a new classifier on top of a fixed feature extractor might be good enough (and definitely is the fastest approach). But if the data is quite different – for example the conv net was trained on photos while you’re trying to fine-tune on paintings or illustrations – then it might make sense to tweak the earlier layers too.

(Gokkul Nath T S) #3
  1. Finetuning the convnet: (Similar to Domain Adaptation)

From my Understanding,This is commonly used when the data which you plan to train is different from the one which the Pre trained network was trained. In such Scenario you ideally unfreeze the all the layers and try to tune them according to your desired Data.Subsequently Lower layers will also be modified.

Would the only reason to use this first type be to speed up learning times by avoiding starting with non-random weights?
Ans: Yes. It reduces the time required as it has already been trained for a while on large data and is far far better than random or other initialization because When you train Deeper Networks there are many pitfalls as getting struck in local minima,non convergence… so using Pre trained weights you don’t have to worry much about it.

(Bhabani) #4

Yes Finetuning does increase the speed of learning. It is generally required to achieve proper convergence if not achieved before. If features present in the data is similar to the features of theta of already trained CovNet, then freezing the first two layers helps. The reason is because the beginning layers detect the more generalised features and more deep the layers goes, more data specific features it captures. If you training in different data with more/less classes then freeze the entire CovNet apart from the last Softmax layer. Then remove and add a new one. Train the new data on it.