Transfer Learning Approaches

I am trying to understand the two common approaches of transfer learing. I am reading this: Transfer Learning for Computer Vision Tutorial — PyTorch Tutorials 1.9.0+cu102 documentation.

I am wondering what is the role of pre-trained weights(like ImageNet) in “ConvNet as fixed feature extractor” approach? Say, I am using resnet-18 (pre-trained on ImageNet), freezing all the convolutional layers in the network and keeping the last fully connected layer which is trained with random weight initialization(as stated in the given link). Since the convolution layer is freezed, the weights are not being updated. Besides, the last fully connected layer is trained with random weight initialization based on the target dataset. So, in this scenario, what is the role of pre-trained weights from ImageNet? Kindly reply.

As mentioned by Jeremy in the course and as per the research done in the paper Visualizing and Understanding Convolutional Networks the earlier layers are(being trained on huge datasets such as ImageNet) are very good at detecting basics features like lines, edges, etc. and for most of the Computer Vision tasks these features remain important and they are the foundations for detecting more complex and data specific shapes such as an eye of a dog for example. So that’s why we freeze the initial layers and only train the last few layers which then learn to recognize the more complex data features. And once that is done we slightly unfreeze the earlier layers and train them with a slow learning rate than compared to the last layers( as the last few layers require more learning, while earlier layers are already good and detecting basics features)

1 Like

What Aadil is substantial correct. Maybe I will add the following.
In CNN, successive layers consist of many “modules”, sometimes called filters, which each “look” in the previous layer’s output to detect a specific feature. These features are learned, not defined by the designer, but it appears that they are simple features like lines, textures, colors in the layers close to the input and then get more and more abstract or complex when you move toward the output layer. Essentially, the CNN converts a image (an unstructured and highly redundant set of data) into a compressed representation of relevant features which the machine can then use to do something else, like classification, regression or any other ML task.

What you do not want to loose is this conversion from image to machine representation, and this is what is called transfer learning. What you do with the representation of each image will be project dependent. Most natural images will display similar features. In some case when you images are very special (e.g. medial imaging images or some synthetic images like radar, …) you do not need to change much of that “image representation” part of the model. You just need to design and train a classifier, regressor or other ML model for the last few layers.

Hope it helps.