Conv_model: how did it get its weights initialized in the function train_dense_layers() in dogscats-ensemble.ipynb

metamich · May 18, 2017, 4:26pm

In the definition of the function, I can see that new weights from training the fc_layers got transferred to the stitched-together conv_model. The “conv_model” here is somewhat confusingly named, as it was previously just the conv layers, and then after getting stitched together with the fc_layers, is the full model.

My question is: how did the conv layers in the conv_model get their weights? It doesn’t look likely that it’s randomly initialized. But I can’t find the code where the weights got set for the conv layers. For the fc layers, there are some lines that set the weights, but not for the conv layers.

The weight setting for the fc layers is in the following code:

for l1,l2 in zip(conv_model.layers[last_conv_idx+1:], fc_model.layers):
l1.set_weights(l2.get_weights())

But where did the weights from the conv layers come from?

One possibility is that it got transferred directly from the get_conv_model function, but that function looks like it only copies the model structure/layers layout, but not the weights. Does it?

def get_conv_model(model):
    layers = model.layers
    last_conv_idx = [index for index,layer in enumerate(layers) 
                         if type(layer) is Convolution2D][-1]

    conv_layers = layers[:last_conv_idx+1]
    conv_model = Sequential(conv_layers)
    fc_layers = layers[last_conv_idx+1:]
    return conv_model, fc_layers, last_conv_idx

For ease of reference, the function definition is here:

def train_dense_layers(i, model):
    conv_model, fc_layers, last_conv_idx = get_conv_model(model)
    conv_shape = conv_model.output_shape[1:]
    fc_model = Sequential(get_fc_layers(0.5, conv_shape))
    for l1,l2 in zip(fc_model.layers, fc_layers): 
        weights = l2.get_weights()
        l1.set_weights(weights)
    fc_model.compile(optimizer=Adam(1e-5), loss='categorical_crossentropy', 
                     metrics=['accuracy'])
    fc_model.fit(trn_features, trn_labels, nb_epoch=2, 
         batch_size=batch_size, validation_data=(val_features, val_labels))

    gen = image.ImageDataGenerator(rotation_range=10, width_shift_range=0.05, 
       width_zoom_range=0.05, zoom_range=0.05,
       channel_shift_range=10, height_shift_range=0.05, shear_range=0.05, horizontal_flip=True)
    batches = gen.flow(trn, trn_labels, batch_size=batch_size)
    val_batches = image.ImageDataGenerator().flow(val, val_labels, 
                      shuffle=False, batch_size=batch_size)

    for layer in conv_model.layers: layer.trainable = False
    for layer in get_fc_layers(0.5, conv_shape): conv_model.add(layer)
    for l1,l2 in zip(conv_model.layers[last_conv_idx+1:], fc_model.layers): 
        l1.set_weights(l2.get_weights())

    conv_model.compile(optimizer=Adam(1e-5), loss='categorical_crossentropy', 
                       metrics=['accuracy'])
    conv_model.save_weights(model_path+'no_dropout_bn' + i + '.h5')
    conv_model.fit_generator(batches, samples_per_epoch=batches.N, nb_epoch=1, 
                            validation_data=val_batches, nb_val_samples=val_batches.N)
    for layer in conv_model.layers[16:]: layer.trainable = True
    conv_model.fit_generator(batches, samples_per_epoch=batches.N, nb_epoch=8, 
                            validation_data=val_batches, nb_val_samples=val_batches.N)

    conv_model.optimizer.lr = 1e-7
    conv_model.fit_generator(batches, samples_per_epoch=batches.N, nb_epoch=10, 
                            validation_data=val_batches, nb_val_samples=val_batches.N)
    conv_model.save_weights(model_path + 'aug' + i + '.h5')