How to obtain ground truth for semantic image segmentation of satellite imagery?


Hello everyone!

I have been playing around with the DSTL Satellite Kaggle Competition and I have found the problem to be both very challenging and exciting so far. Unfortunately, the provided training data poses multiple challenges e.g. class imbalances, very diverse etc. You can learn more about these problems by watching this talk by the 3rd place winner if you haven’t seen it before and you’re interested.

What I am trying to do now is creating my own training set using different satellite images. These images, however, are not labeled for segmentation and I am trying to think of solutions how I can create the ground truth for these images (semi-)automatically. My initial approach is to use something like OpenStreetMap to download data for a region with the same zoom level, longitude and latitude as the corresponding unlabeled satellite image. Afterwards I want to extract buildings, vegetation, waterways, etc.

Can you think of a better solution or do you know of any tool that would be helpful in this process?