Proposal to add Gaussian Blur to data augmentation

kechan · February 18, 2019, 6:02pm

Update: I have some code available. Gaussian blur now implemented as a F.conv2d op on pytorch tensor. Adopt this for your use case.

github.com

kechan/FastaiPlayground/blob/master/Quick Tour of Data Augmentation.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ALoN_3kDr1SS"
   },
   "source": [
    "## Quick Short Tour of FastAI Image Data Augmentations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "m6bjRoXjr1SW"
   },
   "source": [
    "Fastai is a deep learning library built on top of PyTorch. It has a goal of integrating/encapsulating the state of the art techniques in Deep Learning such that they are widely accessible, often in only a few lines of code.\n",

This file has been truncated. show original

See https://github.com/fastai/fastai/issues/1661

Motivation:
I found this a useful data augmentation esp. if your dataset comes from out of focus end user photos. This is even more frequent if you are making inferences continuously from a video stream. I find Gaussian Blur + Shear is also a good approximation for motion blur.

It is actually hard to curate naturally occurring out of focus or motion blurred photos because user usually don’t post them. Lot of benchmark and competition actually use clear photo (even in low resolution) so the need for this is less there as an open src library. But I do suspect companies implement this internally to deal with the scenario I mentioned.

Would like to hear feedback, and if you have experience with this.

JoshVarty · March 14, 2019, 7:24pm

It would definitely be convenient if Gaussian Blur were incorporated as a default fastai transform.

For those interested in this, the relevant code is:

def gaussian_kernel(size, sigma=2., dim=2, channels=3):
    # The gaussian kernel is the product of the gaussian function of each dimension.
    # kernel_size should be an odd number.
    
    kernel_size = 2*size + 1

    kernel_size = [kernel_size] * dim
    sigma = [sigma] * dim
    kernel = 1
    meshgrids = torch.meshgrid([torch.arange(size, dtype=torch.float32) for size in kernel_size])
    
    for size, std, mgrid in zip(kernel_size, sigma, meshgrids):
        mean = (size - 1) / 2
        kernel *= 1 / (std * math.sqrt(2 * math.pi)) * torch.exp(-((mgrid - mean) / (2 * std)) ** 2)

    # Make sure sum of values in gaussian kernel equals 1.
    kernel = kernel / torch.sum(kernel)

    # Reshape to depthwise convolutional weight
    kernel = kernel.view(1, 1, *kernel.size())
    kernel = kernel.repeat(channels, *[1] * (kernel.dim() - 1))

    return kernel

def _gaussian_blur(x, size:uniform_int):
    kernel = gaussian_kernel(size=size)
    kernel_size = 2*size + 1

    x = x[None,...]
    padding = int((kernel_size - 1) / 2)
    x = F.pad(x, (padding, padding, padding, padding), mode='reflect')
    x = torch.squeeze(F.conv2d(x, kernel, groups=3))

    return x

gaussian_blur = TfmPixel(_gaussian_blur)

The default values for sigma were a little high for my purposes but when I played around with them I got satisfactory results.

I think you would need to expose sigma in _gaussian_blur.

kechan · March 14, 2019, 8:13pm

Glad you like it. I don’t think I will get around to do PR any time soon, so I don’t mind if you go ahead and start one (assuming you urgently want it in fastai) and let the code maintainers review it. And of course, add the sigma or whatever you think will help,

kechan · March 16, 2019, 12:13am

Note: There maybe small improvement over this. gaussian_kernel should probably be cached, or even made static for local purpose (if you stick with one parameterization).

I think the gaussian kernel is computed over each and every call to this, which is a waste.

kechan · March 16, 2019, 12:23am

Alternatively, a far cheaper blur is just avg, and you don’t have to think about sigma. So if your image is blurring nicely with this, go with this instead. As usual, you can cache the kernel.

E.g.
1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9

Also, see approx. Gaussian and other basic image filter from:

My latest thought is that for most cases, these are simpler and faster, if you are not picky about the sigma and size.

For motion blur, you can combine this with “Shear” (see my notebook for example).
An alternative is “differential blur”, just blur one direction more than the other. If you think about it, this is what motion blur is, you are moving in one direction smearing it, but the other direction may be ok. Ultimately, you want to visualize the results.

@JoshVarty
Due to this, i don’t think it’s P1 for this to be part of FastAI, since there are many ways to do this with various speed vs. custom-level trade off. Also, @sgugger mentioned he doubt many people will want this (pls feel free to correct if this isn’t true).

sgugger · March 16, 2019, 12:28pm

I think you could gain a lot of speed by

sorting your kernel somewhere
doing this directly on a batch of images (you’re making x a batch anyway) on the GPU

Then you can implement it as a batch transform.

xeTaiz · March 16, 2019, 3:13pm

you might further look into separating your gaussian kernels (also works with the box filter you mentioned).
This should also speed things up, especially for larger kernels.
The idea is that filtering an image with [1 2 1], then with [1 2 1]^T (transposed), its the same as filtering with

1 2 1
2 4 2
1 2 1

Reduces the amount of computations further.

Edit: computer vision libraries usually also do another trick for convolutions, which is computing them in frequency domain (as a convolution is just a multiplication there). However this usually scales very differently for GPUs/CPUs and is probably already implemented in pytorchs convolution function (at least the one that calls CUDA should do it at some point I believe)

kechan · March 16, 2019, 8:45pm

Not entirely sure what you mean in (1). In lot of context for weak GPU, i thought maybe better to do data aug on CPU rather? if you can provide code snippet or even pseudo code that will be great.

This thread is generating good background discussion, I will link this to my notebook. Whoever find the blur useful can consult on a case by case basis.

kechan · March 16, 2019, 8:49pm

Thanks. If you can provide briefly snippet how to achieve this separate/factor of kernel using F.conv*** or whatever pytorch that will be great. The constraint is to do this with pytorch tensor, ideally with a math func it provides.

For most case, it’s just 3x3, or 5x5, so not sure we are getting into micro/pre-mature optimization territory…

xeTaiz · March 16, 2019, 8:57pm

Well, seperated kernels should scale the number of computations by 2K / K² for a kernel of size K x K. This does not take into consideration that linear separated kernels might scale worse (or different) with other kinds of optimizations …
What you need to do is just

vert_filtered = F.conv(im,            Kx1 kernel)
blurred =       F.conv(vert_filtered, 1xK kernel)

This 2K / K² is already
2/3 for a 3x3,
0.4 for 5x5.
~.28 for 7x7

kechan · March 16, 2019, 9:16pm

Thanks. I tried this:

kernel.shape is torch.Size([1, 1, 3, 1])

X = F.conv2d(X, kernel) // do the reflect padding if you want.
X = F.conv2d(X, torch.transpose(kernel, 2, 3))

for future ref.

xeTaiz · March 16, 2019, 11:14pm

Actually a quick note on the padding: I know reflection is the default approach in fastai data augmentation. This is however due to the fact that it usually produces the most sensible images, considering it’s usually small striped (from rotation or so) that need to be filled.
When blurring images you usually want your padding in a way that does not introduce large gradients at the border (image pixel gradients, contrast, not SGD gradients). At least as I am aware of from my computer vision background you will almost certainly want to use repeat padding (just replicate the border values), although it should usually not matter too much with smaller filters.

Anne_Onime · June 21, 2019, 11:34am

Hi,

I think there is a small mistake with the gaussian implementation, it should be (np.sqrt(2) * std) and not (2 * std) in the exponential otherwise you will get / 4 * std^2 in the gaussian exponential after the square.

Also why did you use reflect padding and not circular padding for the convolution ? I think it would make more sense to use circular so that the result is the same as with FFT, which by the way would be faster I think for large kernels (approximately the size of the image), as the complexity would be O(N log N) vs O(N^2).

Cheers,
Anne

safekidda · March 16, 2020, 9:22pm

@kechan I’m doing image similarity from images captured from video frames and I can confirm that both blur and shear help quite a bit, so thanks!