Whats going on with fine-tuning again?

It is explained in the cs231n post:
This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset

Should you always retrain all the layers after the highest one you are training? Specific example – lets say you want to fine tune the last convolutional layer of a CNN – Should you also train the dense layers after it? I could just train that convolutional layer, holding the dense layers as untrainable. Does that make sense?

More specifically, lets say I retrain all the dense layers (fine tuning for a dataset). Then I want to retrain the last convolutional layer – should I ONLY train that layer – would that make my training converge faster? I am going to run the experiment, but I wanted to see what others thought of this.

@resdntalien I’m not an expert (yet ;-)) but my understanding is that at least in your first specific example you always want to train all of the layers that you’ve added after popping other layers off the pre trained model. Those “dense layers after it” that you talk about should be new layers with random weights before retraining. I’m not aware of any methods where you insert a layer into the middle of a network and don’t train above or below it, but I could be wrong. Intuitively it doesn’t make sense to me though because the optimal function should already have been found.

With regards to your second example though you don’t always want to only pop and retrain a single new layer; it really depends on how well the pre trained model fits to your problem. @jeremy covers this in lesson 3 when he talks about tackling the state farm distracted driver problem vs the cats and dogs classifier. The Imagenet model that serves as the starting point is much more suited to cats and dogs because it’s already finding them. So you can pop a single layer and get good results. The distracted driver requires you to pop more layers and retrain the model so that it’s more suited to find elements that indicate ditraction.

Does that answer your question?

1 Like

@ben.bowles, where is “your " finetune ? In vgg16.py ? Mine in vgg16.py is, quite different from yours, confused here.
def finetune(self, batches):
”""
Modifies the original VGG16 network architecture and updates self.classes for new training data.

        Args:
            batches : A keras.preprocessing.image.ImageDataGenerator object.
                      See definition for get_batches().
    """
    self.ft(batches.nb_class)
    # get a list of all the class labels
    classes = list(iter(batches.class_indices))

    # batches.class_indices is a dict with the class name as key and an index as value
    # eg. {'cats': 0, 'dogs': 1}

    # sort the class labels by index according to batches.class_indices and
    # update model.classes
    for c in batches.class_indices:
        classes[batches.class_indices[c]] = c
    self.classes = classes