This data augmentation is described here and here.
Let me illustrate the necessity of this robust augmentation with a story:
In 2016 a paper called “Why Should I Trust You?” trained a CNN to distinguish Huskies vs Wolves, the model looks promising with good accuracy results. But then they applied the LIME interpretability method to show which parts the CNN were focusing on. They realised the key part for distinguishing both animals was the background: Green parks for huskies and snowy backgrounds for the wolves.
If they had the opportunity to apply Context Augmentation, they probably would had overcome this problem.
Extra: How can I get images with transparent background?
This is the boring and difficult task. No one wants to manually label and segment its entire dataset. You can train a segmentation CNN but you still have to label some images.
Would be interested to have others give it a try and see if their results are intriguing enough to continue development into something that could be fully automated.
An example of a simple model trained on only generated images:
I was thinking this could be a good way to bootstrap a dataset (and speed up labeling tasks by giving you a starting point to correct instead of having to annotate from scratch).
This is an interesting problem I looked at briefly for weeds. I had started with weeds in spring wheat and then switched to grass. Also I took a load of photos from the web but they were on a variety of backgrounds including pavements. Finally, I wanted weeds at different life stages. Here’s a first stab I made with cycleGAN (~5k of photos input and ~2k images of desired out).
Aging a dock weed:
Or going from scrubby grass:
I didn’t get too far yet (lack of time) but think it could work well… although it does go wrong sometimes
Anyway, on transparency I think @jerevon actually has a paper using weed masks created from “green on brown” images with (IIRC) green thresholds. This shows that the “simple tricks” you mention can be effective in particular settings.
For now I am more excited by self supervised learning ideas- I guess only applicable if you can loads of unlabelled data (-> robot to trundle round fields in my case)