RGB Transformations for Data Augmentation

When you look at the source code for cutout in fastai.vision.transform:

def _cutout(x, n_holes:uniform_int=1, length:uniform_int=40):
    "Cut out `n_holes` number of square holes of size `length` in image at random locations."
    h,w = x.shape[1:]
    for n in range(n_holes):
        h_y = np.random.randint(0, h)
        h_x = np.random.randint(0, w)
        y1 = int(np.clip(h_y - length / 2, 0, h))
        y2 = int(np.clip(h_y + length / 2, 0, h))
        x1 = int(np.clip(h_x - length / 2, 0, w))
        x2 = int(np.clip(h_x + length / 2, 0, w))
        x[:, y1:y2, x1:x2] = 0
    return x

It doesn’t say what type 'x' is. It seems obvious to me that this is an image, but I’m not certain how to find out if it is an image of 4 dimensions (batch size, rgb channel, height, width) or just (rgb channel, height, width)?

The reason I ask is because I’d like to perform a transformation on only one or two of the three channels. I think that removing one or two channels of colors would be a good addition to the list of data transformations. This is very much in sync with the ethos of fastai transformations

… that don’t change what’s inside the image (for the human eye) but change its pixel values. Models trained with data augmentation will then generalize better.
(Source: https://docs.fast.ai/vision.transform.html#_cutout)

To get a better idea of what I’m saying, consider the following images:

image

image
image
image
image
image
image

Sorry if this I’m repeating an already existing discussion (to the best of my knowledge, this is a new topic :slight_smile:)

cutout takes a fastai Image as all transforms do, so it’s something with channels, height, width.
As you remove channels, try to think with what you’ll replace them since your model will expect an image with 3 channels.

@sgugger thanks for the response, and my sincere apologies for the delayed reply.

I wasn’t thinking of removing channels, but a couple other things:

  1. Replacing one or two channels with zero tensors of the same width and height, giving us images like the ones posted earlier. This lets us replicate a 3-channel input’s information with only one or two channels (assuming color doesn’t matter); though I’m not sure how useful this will be practically.

  2. Replacing one of three channels with a torch.rand tensor of the same width and height. Here’s what that does:

Before (all 3 channels):
img



After (Channel 0 randomised):

img_raw    = img.data.clone()
img_raw[0] = torch.rand(img_raw.shape[1:])

1 Like

I played around with setting a single channel to zeros as well as setting a channel to random and the latter increased the accuracy on my validation set by 1-2%, definitely think its a good idea to add this to the fastai library!

@rsomani95 have you an PR’s submitted yet? If not I might try submit one over the weekend…

There are some parameters that need to be paid closer attention to. Simply randomising one channel might do more harm than good in some instances. I’m working on a notebook that showcases this, will post here as soon as its ready.

That’s great! Unfortunately I haven’t been able to test this as I haven’t and will not have access to a GPU for a few weeks now.
How robust were the experiments you did? Did you see a 1-2% increase just the one time you tried it or consistently across multiple attempts?

Not yet, but I’d like to be the one to do this once the aforementioned issues are sorted out.
I was under the impression that before adding a new feature to the library, there’s supposed to be some discussion/approval with contributors such as @sgugger and @stas ?

You can always submit a PR :slight_smile:
If we feel it should be hosted on a separate repo because it’s too specific to be in the core fastai, we’ll tell you, but a new transform that sounds useful is always welcome!

Awesome. I just submitted a PR. It’s my first one ever! :smiley:

2 Likes

Congrats @rsomani95 on your first PR, pretty cool!

I hadn’t done much on this for a while, but thanks for the comments above. I did some testing on the performance of different RGB transforms using the Stanford Cars dataset, the results are in the notebook linked below:

  • rgb_tweak - randomly set a random channel to zeros, set the probablilty that a tranformation will occur
  • rgb_tweak_rand - randomly set a random channel to noise, set the probablilty that a tranformation will occur
  • rgb_randomize - your implementation in fastai.vision.transform

These were the results from the validation accuracy (averaged over 5 runs, 40 epochs each run):

It looked like randomly zeroing out a random channel had the largest boost in performance, a solid 1% increase in accuracy. Adding noise seemed to regress performance compared to the baseline. However there is a chance that the noise thresholds I used were too high, e.g. for your transform maybe I should have tested thresh=0.15 also.

Do you have any results from testing? Curious if these results might be specific to the dataset I tested on or whether my implementation could have been better.

Preformatted text