Why is my model taking ~200 seconds per epoch?

mattobrien415 · December 24, 2016, 10:47pm

This is strange, as my partner-n-crime @prateek2686 and I have almost exactly the same architecture, yet my epochs are roughly 5 times as long as his!

I’ve tried fiddling wth the layers, with the learning rate, optimizer type, batch size. Nothing I have tried will change the length of the epochs significantly!

Where else can I look for the source of the slowness? Yes I am using the p2.xl (Tesla K80) on AWS…

The code is very straightforward:

model = Vgg16().model
conv_layers,fc_layers = split_at(model, Convolution2D)
del fc_layers
conv_model = Sequential(conv_layers)

# Using the name 'pafs' -- predictions / activations / features -- however you want to look at it
conv_pafs = load_array(path + 'conv_pafs.bc')
val_pafs = load_array(path + 'val_pafs.bc')

(val_classes, trn_classes, val_labels, trn_labels, 
    val_filenames, filenames, test_filenames) = get_classes(path)

def get_fc_model():
    model = Sequential([
        BatchNormalization(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dense(4096, activation='relu'),
        Dropout(0.5),
        BatchNormalization(),
        Dense(4096, activation='relu'),
        Dropout(0.5),
        BatchNormalization(),
        Dense(2, activation='softmax')
        ])

    model.compile(optimizer=RMSprop(lr=0.0001, rho=0.7), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

fc_model = get_fc_model()

fc_model.fit(conv_pafs, trn_labels, nb_epoch=8, 
             batch_size=batch_size, validation_data=(val_pafs, val_labels))

mattobrien415 · December 24, 2016, 11:36pm

Figured it out – it turns out, if you throw in a maxpooling layer at the beginning of the fully connected model, such as .
MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]), .
the model will run a lot a lot quicker.

This hammers home the idea that, aside from promoting translation invariance, maxpooing really reduces computation demands (the size of the data to contend with was halved).

jeremy · December 26, 2016, 8:21pm

Your main problem is that you have a batchnorm layer at the start that’s operating on the output of a convolutional layer - but you forgot to add the ‘axis=1’ parameter! If you add that, you’ll find it runs faster, is more accurate, and your max pooling layer isn’t as necessary.

icanseeformiles · January 2, 2017, 7:31am

I hadn’t questioned the speed of my GPU instance until I read this thread. What ballpark time-wise should I be expecting one epoch to take with the lesson1 example “out of the box”? My TeslaK80 appears to be up and running on my p2 instance. I double checked my .theanorc is set to GPU, and one epoch for me on the training set is taking > 600 seconds. Should I be concerned about my configuration?

rachel · January 4, 2017, 2:10am

@icanseeformiles That time seems reasonable to me. Note that for most of the course, we’ll be using pre-computed features so the epoch times will usually be much faster (5-10 seconds)

karthik_k314 · February 10, 2017, 11:21am

I was facing similar issues while training and realised that cnem was disabled by default in the setup that I (think) most of the class is using. cnmem roughly talks about how much % of GPU memory can your script utilize while running.
I modified the .theanorc file and added the following bit of code
[lib]
cnmem = 0.9

If you’re getting memory errors you can always reduce it to a lower number. With this technique I’m seeing some improvement in my training times. You can try this and let us know.

tapashettisr · February 17, 2017, 6:53am

I am using the onboard NVIDIA GPU of my laptop. Theano and Keras were checked for switching from cpu to gpu and they work perfectly when used from bash. However with lesson 1 notebook training epoch is taking approx 6 hours. I think GPU is not being invoked. How to invoke local gpu while using notebook ?

glyph · February 17, 2017, 9:05am

Make sure you started your notebook from git bash @tapashettisr

tapashettisr · February 17, 2017, 9:13am

I am doing that. Should I first set the THEANO_FLAGS = THEANO_FLAGS_GPU before opening the notebook?

Further will there be any improvement if we use THEANO_FLAGS = THEANO_FLAGS_GPU_DNN

glyph · February 17, 2017, 9:21am

I’ve figured out what each argument to theano means, and kindof run a custom command, but I started with the DNN variant from the guide you used. Watch the cnmem argument, you might have to reduce the value if you get memory errors.

glyph · February 17, 2017, 9:31am

Show me your notebook.

tapashettisr · February 17, 2017, 9:39am

The GPU is working now with nb. I was setting the THEANO_FLAGS and starting the notebook from different bash terminals.