Blur as data augmentation

Could someone explain me why blurring a picture is not a good idea as data augmentation ? (I suppose it is not a good idea because otherwise keras’ ImageDataGenerator would implement it)

Intuitively, it seems to make sense as it would in some sense reduce the “noise” of the data and allow the model to learn the really important features of the data.

CNN’s are responsible for extracting the important features of the data from the given low level information. For example, think about max pooling - you are taking the maximum values from within a series of filters effectively removing the noise. The goal of data augmentation is to “twist and turn” the information, not remove it.

1 Like

Blurring might be a useful preprocessing step. There’s also something cool called anisotropic diffusion which could be useful too. I’ve used both in the past for preprocessing of stuff for more classical CV approaches - they’ll probably be less helpful for neural networks but you could always create your own version of the Keras generator and experiment.

Just ran to this and remembered your question:

“Understanding How Image Quality Affects Deep Neural Networks”

1 Like

Actually I think maybe training with blur might be a good idea. If I’m reading that paper correctly, it is focused on the the ability of existing networks, like vgg16, to identify images with “quality issues.” They found that things like jpeg compression don’t affect performance as much as blur.

I think that would be because the sets were trained with jpeg compressed images and not with blur. It seems with enough examples, that the model should learn about the cat-ness of even a blurry cat. As long as it wasn’t too blurred out.