Wiki: Lesson 2

surmenok · January 29, 2018, 12:39am

You can think about the model as if it consists of two parts:
a) a few convolutional layers that get raw pixels as input and produce a vector of size 1024.
b) 2 dense layers. The first has 1024 features as input, 512 features as output, the second gets 512 features as input and produces 2 as output. These 2 outputs (after applying softmax) are probabilities for cat vs dog.

If you initialize the learner with precompute=True parameter, the learner is doing a smart computational optimization. It evaluates the 1st part of the model (convolutional layers) for every image in your dataset. As a result, it computes a vector of 1024 numbers for each image. This is what is called “precomputed activations”.
Then when you train the model on your dogs’n’cats dataset, the learner doesn’t do all calculations for the convolutional layers again. It just gets the precomputed activations for every image, and trains only the 2nd part of the model - two dense layers. It speeds up the training a lot because two dense layers are a very small part of the entire model, it doesn’t take much time to execute.
Of course, precomputing activations will help only if you don’t want to retrain convolutional layers.

The general idea of freezing lower layers is that you want to preserve information that was gained when the original model was trained on the large dataset. Unfreezing all layers would likely lead to forgetting of some important low level filters.
Above I described that precompute helps to optimize training when you want to train the last two dense layers and “freeze” all the convolutional layers.
You can decide to do something else, e.g. freeze only the first few convolutional layers and train the last few convolutional layers and dense layers.
When you train the model, the forward pass goes through all the layers. But when you calculate an error and do backpropagation, you update only weights of layers that are “unfrozen” and don’t change weights in “frozen” layers.
Using fastai library, you have a fine-grained control on which layers are “frozen” (untrainable) and which are “unfrozen” (trainable).