Context Augmentation (transparent imgs + backgrounds imgs)

I’m writing this, to propose a new kind of data augmentation to the library:

Context Augmentation

The combination of foreground images that has transparent background (pngs with alpha channel) with other background images.

This data augmentation is described here and here.

Let me illustrate the necessity of this robust augmentation with a story:

In 2016 a paper called “Why Should I Trust You?” trained a CNN to distinguish Huskies vs Wolves, the model looks promising with good accuracy results. But then they applied the LIME interpretability method to show which parts the CNN were focusing on. They realised the key part for distinguishing both animals was the background: Green parks for huskies and snowy backgrounds for the wolves.


If they had the opportunity to apply Context Augmentation, they probably would had overcome this problem.

Extra: How can I get images with transparent background?

This is the boring and difficult task. No one wants to manually label and segment its entire dataset. You can train a segmentation CNN but you still have to label some images.

Exist some non-deep-learning techniques in the literature. For example in OpenCV you can extract the background from a static video (like an IP camera):

There is an even simpler trick to extract the background from a static video: Take the median of each pixel. Very clever aswell.


I’m a fan of this idea. In fact I tried a similar strategy earlier this year (for object detection but without the auto segmentation) and posted the code and results: Synthetic Object Detection Dataset Generation Code and Tutorial

Would be interested to have others give it a try and see if their results are intriguing enough to continue development into something that could be fully automated.

An example of a simple model trained on only generated images:

I was thinking this could be a good way to bootstrap a dataset (and speed up labeling tasks by giving you a starting point to correct instead of having to annotate from scratch).


This is an interesting problem I looked at briefly for weeds. I had started with weeds in spring wheat and then switched to grass. Also I took a load of photos from the web but they were on a variety of backgrounds including pavements. Finally, I wanted weeds at different life stages. Here’s a first stab I made with cycleGAN (~5k of photos input and ~2k images of desired out).

Aging a dock weed:

Or going from scrubby grass:

I didn’t get too far yet (lack of time) but think it could work well… although it does go wrong sometimes :smiley:

Anyway, on transparency I think @jerevon actually has a paper using weed masks created from “green on brown” images with (IIRC) green thresholds. This shows that the “simple tricks” you mention can be effective in particular settings.

For now I am more excited by self supervised learning ideas- I guess only applicable if you can loads of unlabelled data (-> robot to trundle round fields in my case)

Do you have a situation in mind?