Getting an error trying to remove Dropout from a finetuned Vgg16BN model

Been working on this for a few hours and can’t figure out why I’m getting a MemoryError exception when I try to run:

 fc_model.fit(train_features_conv, train_labels, nb_epoch=8, 
                 batch_size=batch_size, validation_data=(val_features_conv, val_labels))                

at the bottom of the code below.

I don’t think it is a legit memory error (batch_size=4), but rather has something to do with how I’m setting the weights for the FC model with Dropout changed from 0.5 to 0.0. Nevertheless, I can’t figure out what I’m doing wrong.

Any help would be appreciative to my sanity.

Here is the relevant code:

model.load_weights(best_weights_f)
model.summary()

____________________________________________________________________________________________________
> Layer (type)                     Output Shape          Param #     Connected to                     
> ====================================================================================================
> lambda_1 (Lambda)                (None, 3, 224, 224)   0           lambda_input_1[0][0]             
> ____________________________________________________________________________________________________
> zeropadding2d_1 (ZeroPadding2D)  (None, 3, 226, 226)   0           lambda_1[0][0]                   
> ____________________________________________________________________________________________________
> convolution2d_1 (Convolution2D)  (None, 64, 224, 224)  1792        zeropadding2d_1[0][0]            
> ____________________________________________________________________________________________________
> zeropadding2d_2 (ZeroPadding2D)  (None, 64, 226, 226)  0           convolution2d_1[0][0]            
> ____________________________________________________________________________________________________
> convolution2d_2 (Convolution2D)  (None, 64, 224, 224)  36928       zeropadding2d_2[0][0]            
> ____________________________________________________________________________________________________
> maxpooling2d_1 (MaxPooling2D)    (None, 64, 112, 112)  0           convolution2d_2[0][0]            
> ____________________________________________________________________________________________________
> zeropadding2d_3 (ZeroPadding2D)  (None, 64, 114, 114)  0           maxpooling2d_1[0][0]             
> ____________________________________________________________________________________________________
> convolution2d_3 (Convolution2D)  (None, 128, 112, 112) 73856       zeropadding2d_3[0][0]            
> ____________________________________________________________________________________________________
> zeropadding2d_4 (ZeroPadding2D)  (None, 128, 114, 114) 0           convolution2d_3[0][0]            
> ____________________________________________________________________________________________________
> convolution2d_4 (Convolution2D)  (None, 128, 112, 112) 147584      zeropadding2d_4[0][0]            
> ____________________________________________________________________________________________________
> maxpooling2d_2 (MaxPooling2D)    (None, 128, 56, 56)   0           convolution2d_4[0][0]            
> ____________________________________________________________________________________________________
> zeropadding2d_5 (ZeroPadding2D)  (None, 128, 58, 58)   0           maxpooling2d_2[0][0]             
> ____________________________________________________________________________________________________
> convolution2d_5 (Convolution2D)  (None, 256, 56, 56)   295168      zeropadding2d_5[0][0]            
> ____________________________________________________________________________________________________
> zeropadding2d_6 (ZeroPadding2D)  (None, 256, 58, 58)   0           convolution2d_5[0][0]            
> ____________________________________________________________________________________________________
> convolution2d_6 (Convolution2D)  (None, 256, 56, 56)   590080      zeropadding2d_6[0][0]            
> ____________________________________________________________________________________________________
> zeropadding2d_7 (ZeroPadding2D)  (None, 256, 58, 58)   0           convolution2d_6[0][0]            
> ____________________________________________________________________________________________________
> convolution2d_7 (Convolution2D)  (None, 256, 56, 56)   590080      zeropadding2d_7[0][0]            
> ____________________________________________________________________________________________________
> maxpooling2d_3 (MaxPooling2D)    (None, 256, 28, 28)   0           convolution2d_7[0][0]            
> ____________________________________________________________________________________________________
> zeropadding2d_8 (ZeroPadding2D)  (None, 256, 30, 30)   0           maxpooling2d_3[0][0]             
> ____________________________________________________________________________________________________
> convolution2d_8 (Convolution2D)  (None, 512, 28, 28)   1180160     zeropadding2d_8[0][0]            
> ____________________________________________________________________________________________________
> zeropadding2d_9 (ZeroPadding2D)  (None, 512, 30, 30)   0           convolution2d_8[0][0]            
> ____________________________________________________________________________________________________
> convolution2d_9 (Convolution2D)  (None, 512, 28, 28)   2359808     zeropadding2d_9[0][0]            
> ____________________________________________________________________________________________________
> zeropadding2d_10 (ZeroPadding2D) (None, 512, 30, 30)   0           convolution2d_9[0][0]            
> ____________________________________________________________________________________________________
> convolution2d_10 (Convolution2D) (None, 512, 28, 28)   2359808     zeropadding2d_10[0][0]           
> ____________________________________________________________________________________________________
> maxpooling2d_4 (MaxPooling2D)    (None, 512, 14, 14)   0           convolution2d_10[0][0]           
> ____________________________________________________________________________________________________
> zeropadding2d_11 (ZeroPadding2D) (None, 512, 16, 16)   0           maxpooling2d_4[0][0]             
> ____________________________________________________________________________________________________
> convolution2d_11 (Convolution2D) (None, 512, 14, 14)   2359808     zeropadding2d_11[0][0]           
> ____________________________________________________________________________________________________
> zeropadding2d_12 (ZeroPadding2D) (None, 512, 16, 16)   0           convolution2d_11[0][0]           
> ____________________________________________________________________________________________________
> convolution2d_12 (Convolution2D) (None, 512, 14, 14)   2359808     zeropadding2d_12[0][0]           
> ____________________________________________________________________________________________________
> zeropadding2d_13 (ZeroPadding2D) (None, 512, 16, 16)   0           convolution2d_12[0][0]           
> ____________________________________________________________________________________________________
> convolution2d_13 (Convolution2D) (None, 512, 14, 14)   2359808     zeropadding2d_13[0][0]           
> ____________________________________________________________________________________________________
> maxpooling2d_5 (MaxPooling2D)    (None, 512, 7, 7)     0           convolution2d_13[0][0]           
> ____________________________________________________________________________________________________
> flatten_1 (Flatten)              (None, 25088)         0           maxpooling2d_5[0][0]             
> ____________________________________________________________________________________________________
> dense_1 (Dense)                  (None, 4096)          102764544   flatten_1[0][0]                  
> ____________________________________________________________________________________________________
> batchnormalization_1 (BatchNorma (None, 4096)          16384       dense_1[0][0]                    
> ____________________________________________________________________________________________________
> dropout_1 (Dropout)              (None, 4096)          0           batchnormalization_1[0][0]       
> ____________________________________________________________________________________________________
> dense_2 (Dense)                  (None, 4096)          16781312    dropout_1[0][0]                  
> ____________________________________________________________________________________________________
> batchnormalization_2 (BatchNorma (None, 4096)          16384       dense_2[0][0]                    
> ____________________________________________________________________________________________________
> dropout_2 (Dropout)              (None, 4096)          0           batchnormalization_2[0][0]       
> ____________________________________________________________________________________________________
> dense_4 (Dense)                  (None, 2)             8194        dropout_2[0][0]                  
> ====================================================================================================
> Total params: 134,301,506
> Trainable params: 8,194
> Non-trainable params: 134,293,312
> ____________________________________________________________________________________________________
last_conv_idx = [i for i,l in enumerate(model.layers) if type(l) == Convolution2D][-1]
conv_layers = model.layers[:last_conv_idx+1]
conv_model = Sequential(conv_layers)
fc_layers = model.layers[last_conv_idx+1:]

train_features_conv = conv_model.predict(train_data, batch_size=batch_size)
val_features_conv = conv_model.predict(val_data, batch_size=batch_size*2)

print(train_features_conv.shape, val_features_conv.shape)

save_array(cache_path+'train_features_conv.dat', train_features_conv)
save_array(cache_path+'val_features_conv.dat', val_features_conv)

train_features_conv = load_array(cache_path+'train_features_conv.dat')
val_features_conv = load_array(cache_path+'val_features_conv.dat')

def set_layer_weights(layer, new_p, old_p):
    scale = (1-old_p)/(1-new_p)
    return [ o * scale for o in layer.get_weights() ]

def build_fc_model(new_p, old_p):
    model = Sequential([
            MaxPooling2D((2, 2), strides=(2, 2), input_shape=conv_layers[-1].output_shape[1:]),
            Flatten(),
            Dense(4096, activation='relu'),
            BatchNormalization(),
            Dropout(new_p),
            Dense(4096, activation='relu'),
            BatchNormalization(),
            Dropout(new_p),
            Dense(2, activation='softmax')
        ])
    
    for l1,l2 in zip(model.layers, fc_layers): l1.set_weights(set_layer_weights(l2, new_p, old_p))
    
    opt = Adam(lr=0.00001)

    model.compile(opt, loss='categorical_crossentropy', metrics=['accuracy'])
    return model

fc_model = build_fc_model(0.0, 0.5)
fc_model.summary()
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
maxpooling2d_6 (MaxPooling2D)    (None, 512, 7, 7)     0           maxpooling2d_input_1[0][0]       
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 25088)         0           maxpooling2d_6[0][0]             
____________________________________________________________________________________________________
dense_5 (Dense)                  (None, 4096)          102764544   flatten_2[0][0]                  
____________________________________________________________________________________________________
batchnormalization_3 (BatchNorma (None, 4096)          16384       dense_5[0][0]                    
____________________________________________________________________________________________________
dropout_3 (Dropout)              (None, 4096)          0           batchnormalization_3[0][0]       
____________________________________________________________________________________________________
dense_6 (Dense)                  (None, 4096)          16781312    dropout_3[0][0]                  
____________________________________________________________________________________________________
batchnormalization_4 (BatchNorma (None, 4096)          16384       dense_6[0][0]                    
____________________________________________________________________________________________________
dropout_4 (Dropout)              (None, 4096)          0           batchnormalization_4[0][0]       
____________________________________________________________________________________________________
dense_7 (Dense)                  (None, 2)             8194        dropout_4[0][0]                  
====================================================================================================
Total params: 119,586,818
Trainable params: 119,570,434
Non-trainable params: 16,384
____________________________________________________________________________________________________

fc_model.fit(train_features_conv, train_labels, nb_epoch=8, 
             batch_size=batch_size, validation_data=(val_features_conv, val_labels))

What does the output from nvidia-smi look like?
In my experience, that happens sometimes if you’ve been trying to train a model, and it crashes. If instead of restarting and clearing output you continue to run, it can happen that memory use in the GPU keeps increasing until it hits the maximum. Even with a batch size of 4, you could still get this error. It probably is a different thing, but worth checking this just in case.

Nah the GPU is fine … that is always my first check.

I’m able to finetune and fit other models … but for some reason, not working for what I have above. Had this happen before when I tried to load weights into a finetuned model that were originally saved from the default model … which is why I think it has something to do with that.