Precalculating conv features vs. freezing conv layers

(jerry liu) #1

Hi everyone,

In Statefarm notebook, convolution features are precalculated, using VGG pretrained weights, and saved to disk for convenience. These convolution features are then used to feed into a model of dense layers for classification.

Would this be the same as combining the conv layers + dense layers, and freezing the conv layers by setting trainable = False ?

My motivation for this is that I run out of memory when using large number of data augmentation and concatenating the conv features weights.

Would combining the conv layers and dense layers into a single model, and feeding in training data in batches help?

Or would the training epochs be recalculating the conv features at every epoch?




I am not fully sure I understand. In general, you are free to use the whole model - I don’t think there is anything you gain apart from significant speed up by precomputing the output of the conv layers.

With data augmentation you really have no other option than to feed the data directly into the full model. Unless you precomputed the conv layers with some massive amount of augmented data and save the labels… but not sure that is really a viable option :wink:

All setting trainable to false does is that during training (when you run fit or fit_generator) those layers do not ‘learn’, their weights are not getting updated, but they still need to do the work both in computing the output and the derivative as the data passes through them.

Not sure if that answers your question - if not and you would like to provide additional details, I can take another stab at helping you out.

(Jeremy Howard) #3

Yes. Precomputing is simply an approach to making things (much!) faster, but it’s not required if you don’t have enough disk space.

(Robert William Whelan) #4

I had the same question so I’ll give you my understanding of things. “Pre-calculating” gives you a fixed set of inputs to a new model.

In this case, the new model is a Sequential object with, first, a maxpooling layer, which represents the previous convolutional layers, then a couple new dense layers with 0 dropout, followed by a final dense ‘softmax’ layer which gives us the cat vs. dog categories.

If we don’t pre calculate, but instead make convolutional layers non-trainable, the images get passed through all the convolutional layers to get an input to the dense layers, but the weights in the convolutional layers are not updated.

This is important because with data augmentation, you are basically creating new images and they should be evaluated (but, not necessarily used for training on the convolution layers).