Proposal to add Gaussian Blur to data augmentation

Update: I have some code available. Gaussian blur now implemented as a F.conv2d op on pytorch tensor. Adopt this for your use case.

See https://github.com/fastai/fastai/issues/1661

Motivation:
I found this a useful data augmentation esp. if your dataset comes from out of focus end user photos. This is even more frequent if you are making inferences continuously from a video stream. I find Gaussian Blur + Shear is also a good approximation for motion blur.

It is actually hard to curate naturally occurring out of focus or motion blurred photos because user usually donā€™t post them. Lot of benchmark and competition actually use clear photo (even in low resolution) so the need for this is less there as an open src library. But I do suspect companies implement this internally to deal with the scenario I mentioned.

Would like to hear feedback, and if you have experience with this.

3 Likes

It would definitely be convenient if Gaussian Blur were incorporated as a default fastai transform.

For those interested in this, the relevant code is:

def gaussian_kernel(size, sigma=2., dim=2, channels=3):
    # The gaussian kernel is the product of the gaussian function of each dimension.
    # kernel_size should be an odd number.
    
    kernel_size = 2*size + 1

    kernel_size = [kernel_size] * dim
    sigma = [sigma] * dim
    kernel = 1
    meshgrids = torch.meshgrid([torch.arange(size, dtype=torch.float32) for size in kernel_size])
    
    for size, std, mgrid in zip(kernel_size, sigma, meshgrids):
        mean = (size - 1) / 2
        kernel *= 1 / (std * math.sqrt(2 * math.pi)) * torch.exp(-((mgrid - mean) / (2 * std)) ** 2)

    # Make sure sum of values in gaussian kernel equals 1.
    kernel = kernel / torch.sum(kernel)

    # Reshape to depthwise convolutional weight
    kernel = kernel.view(1, 1, *kernel.size())
    kernel = kernel.repeat(channels, *[1] * (kernel.dim() - 1))

    return kernel

def _gaussian_blur(x, size:uniform_int):
    kernel = gaussian_kernel(size=size)
    kernel_size = 2*size + 1

    x = x[None,...]
    padding = int((kernel_size - 1) / 2)
    x = F.pad(x, (padding, padding, padding, padding), mode='reflect')
    x = torch.squeeze(F.conv2d(x, kernel, groups=3))

    return x

gaussian_blur = TfmPixel(_gaussian_blur)

The default values for sigma were a little high for my purposes but when I played around with them I got satisfactory results.

I think you would need to expose sigma in _gaussian_blur.

1 Like

Glad you like it. I donā€™t think I will get around to do PR any time soon, so I donā€™t mind if you go ahead and start one (assuming you urgently want it in fastai) and let the code maintainers review it. And of course, add the sigma or whatever you think will help,

Note: There maybe small improvement over this. gaussian_kernel should probably be cached, or even made static for local purpose (if you stick with one parameterization).

I think the gaussian kernel is computed over each and every call to this, which is a waste.

Alternatively, a far cheaper blur is just avg, and you donā€™t have to think about sigma. So if your image is blurring nicely with this, go with this instead. As usual, you can cache the kernel.

E.g.
1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9

Also, see approx. Gaussian and other basic image filter from:

My latest thought is that for most cases, these are simpler and faster, if you are not picky about the sigma and size.

For motion blur, you can combine this with ā€œShearā€ (see my notebook for example).
An alternative is ā€œdifferential blurā€, just blur one direction more than the other. If you think about it, this is what motion blur is, you are moving in one direction smearing it, but the other direction may be ok. Ultimately, you want to visualize the results.

@JoshVarty
Due to this, i donā€™t think itā€™s P1 for this to be part of FastAI, since there are many ways to do this with various speed vs. custom-level trade off. Also, @sgugger mentioned he doubt many people will want this (pls feel free to correct if this isnā€™t true).

I think you could gain a lot of speed by

  1. sorting your kernel somewhere
  2. doing this directly on a batch of images (youā€™re making x a batch anyway) on the GPU

Then you can implement it as a batch transform.

you might further look into separating your gaussian kernels (also works with the box filter you mentioned).
This should also speed things up, especially for larger kernels.
The idea is that filtering an image with [1 2 1], then with [1 2 1]^T (transposed), its the same as filtering with

1 2 1
2 4 2
1 2 1

Reduces the amount of computations further.

Edit: computer vision libraries usually also do another trick for convolutions, which is computing them in frequency domain (as a convolution is just a multiplication there). However this usually scales very differently for GPUs/CPUs and is probably already implemented in pytorchs convolution function (at least the one that calls CUDA should do it at some point I believe)

Not entirely sure what you mean in (1). In lot of context for weak GPU, i thought maybe better to do data aug on CPU rather? if you can provide code snippet or even pseudo code that will be great.

This thread is generating good background discussion, I will link this to my notebook. Whoever find the blur useful can consult on a case by case basis.

Thanks. If you can provide briefly snippet how to achieve this separate/factor of kernel using F.conv*** or whatever pytorch that will be great. The constraint is to do this with pytorch tensor, ideally with a math func it provides.

For most case, itā€™s just 3x3, or 5x5, so not sure we are getting into micro/pre-mature optimization territoryā€¦

Well, seperated kernels should scale the number of computations by 2K / KĀ² for a kernel of size K x K. This does not take into consideration that linear separated kernels might scale worse (or different) with other kinds of optimizations ā€¦
What you need to do is just

vert_filtered = F.conv(im,            Kx1 kernel)
blurred =       F.conv(vert_filtered, 1xK kernel)

This 2K / KĀ² is already
2/3 for a 3x3,
0.4 for 5x5.
~.28 for 7x7

1 Like

Thanks. I tried this:

kernel.shape is torch.Size([1, 1, 3, 1])

X = F.conv2d(X, kernel) // do the reflect padding if you want.
X = F.conv2d(X, torch.transpose(kernel, 2, 3))

for future ref.

Actually a quick note on the padding: I know reflection is the default approach in fastai data augmentation. This is however due to the fact that it usually produces the most sensible images, considering itā€™s usually small striped (from rotation or so) that need to be filled.
When blurring images you usually want your padding in a way that does not introduce large gradients at the border (image pixel gradients, contrast, not SGD gradients). At least as I am aware of from my computer vision background you will almost certainly want to use repeat padding (just replicate the border values), although it should usually not matter too much with smaller filters.

Hi,

I think there is a small mistake with the gaussian implementation, it should be (np.sqrt(2) * std) and not (2 * std) in the exponential otherwise you will get / 4 * std^2 in the gaussian exponential after the square.

Also why did you use reflect padding and not circular padding for the convolution ? I think it would make more sense to use circular so that the result is the same as with FFT, which by the way would be faster I think for large kernels (approximately the size of the image), as the complexity would be O(N log N) vs O(N^2).

Cheers,
Anne

@kechan Iā€™m doing image similarity from images captured from video frames and I can confirm that both blur and shear help quite a bit, so thanks! :slight_smile: