Synthetic data generation for Image Segmentation

I am trying on building a Semantic Image Segmentation, using Google’s DeepLab on Pascal Parts dataset.

One approach we thought was to create synthetic data, i.e. images with varied background. But now the question is how do I do it?

As this dataset contains parts of objects, should I place the individual parts on varied backgrounds? That doesn’t seem intuitive to me, but one of my colleague argued that it will “take the part out of context” and would help us detect variations in the part. e.g. putting the image of the head of the bird with a background of desert :stuck_out_tongue:. It completely seem counter-intuitive to me, but I do not have a good theoretical explanation of why this would not work.

Can anyone help me understand why such a process of synthetic data generation won’t be helpful for Semantic Segmentation.

My guess is that the model requires the “context of the whole object” to better detect the parts, can anyone please point out to some relevant material which has explored this, or provide evidence in for/against these hypothesis.

Have a look here: