Precomputing conv layers WITH data augmentation

jp_beaudry · August 17, 2017, 2:48am

Hi all,

I am interested in combining two things that seemingly don’t go together: pre-calculating the output of the convolutional layers of a CNN while benefiting from data augmentation. There is just too much time saving during training not to pursue. I think there is a way but would like your review and thoughts.

In the early classes, we are warned not to combine precalculations of conv layers with data augmentation (or batch shuffling) due to their random nature. But what if that feature/constraint was controlled?

The place where randomness happens is in the custom ‘get_batches()’. Looking at the vgg16.py source code, we see that ‘get_batches’ is merely a call to:

gen.flow_from_directory(path, target_size=(224,224), class_mode=class_mode, shuffle=shuffle, batch_size=batch_size)

Where ‘gen’ is an image.ImageDataGenerator(). This generator can be coerced to always spit out the same modified images in the same order thanks to its ‘seed’ parameter:

seed: optional random seed for shuffling and transformations (doc link)

Couldn’t we then pull from the generator as many batches as desired for which to pre-compute the conv layers’ output. Then either reset or create a new ImageDataGenerator() with the same seed for fitting/training of the dense layers? I’m thinking 5 or 10 epoch’s worth would probably strike the right balance of providing good data augmentation for acceptable disk space usage. After all the data is used by your 5 or 10 epochs, you’d simply restart from the head of the list of images by resetting the generator again for more epochs.

I’m still a novice at Python, so it’ll take me a little bit to experiment with it. But on the face of it, what holes do you see?

Thanks

msp · August 17, 2017, 7:49am

That sounds reasonable to me. How many activations (filters x size) do you have at the final conv layer? If it’s small (e.g. 512 x 7 x 7) then you can save much more than 5-10.

torkku · August 17, 2017, 7:58am

@jp_beaudry Using a static seed sounds to me like it should work!

If you are up to it, please test it out and share the results…