I’ve got a satellite dataset (Sentinel 2) that contains data of various spectral bands for a specific, large area. Within this area are polygons, defined using a shapefile, for which I have the classes.
I am able to mask out the polygons from the various band rasters into numpy arrays using rasterio and geopandas. However, what I’m stuck with right now, is how to convert these arrays into a proper format for a muti-class supervised learning problem. The main issue being the variable size and orientation (imagine shapes like / | \ _ - ~).
Using minimum bouding box is certainly an option, but there is a very high chance of capturing adjacent polygons with this method, which could likely product bad performance.
Looking forward to hearing what you guys think.
P.S First post on the forums, so welcome any critique.
Like mnpinto says, it’ll be more helpful to see an example of your polygons and underlying imagery.
Based on your description alone, sounds like you’re doing something like land use land cover segmentation? You could burn all the polygons (assuming they’re not overlapping) into a 1-channel raster mask with different values (from 0 to 255) representing different classes and use that as your label, akin to the camvid multi-class segmentation masks in part1 lesson 3.
If your polygons for classes are overlapping, you could create multiple channels masks, one for each overlapping class. This would require more customization of the fastai dataloader, model architecture, and loss functions to accommodate multi-channel targets and predictions.
Hi @daveluo and @mnpinto, thanks for the input and apologies for the delay - this project was put on the backburner for me but picking it up again.
So, let me be describe the problem a bit more concretely.
It is a multi-class classification problem
I have polygon shapes of farms, defined by bounding point coordinates in a shapefile and I want to predict the type of crop being grown on that farm from the satellite data.
The task at hand is thus to mask out the polygons from the satellite raster and then transform the masked rasters into an applicable format for image classification.
Thus far, what I’ve done is to mask out the farms and zero-padded each mask to be the size of the largest farm and have the farm data in the center. I’ll upload some examples of to show what I mean.
I started with stacking the RGB channels (there are 14 at various spectral resolutions) to create PNG images and fed that into a imagenet resnet but getting pretty bad performance - ie worse performance than what I’m getting with much simpler models.
Perhaps there are simpler, more effective ways of transforming this problem and that’s what I need advice on.
Again, thanks for the help and looking forward to hearing what you guys think.
Example of masked, zero-padded farms with RGB channels stacked:
I would try to give the model unmasked images and add the mask as an additional input channel. And I would say that using multiple images for each farm (like monthly or seasonal composites, covering at least one year) should help to identify the type of crop.
Hi @mnpinto, thanks for the reply.
Can you explain what you mean with giving the masks as additional input channel?
From what I understand is that you mean I give the full raster to the model, lest say an array X of shape (10980 ,10980, K), and then another array Y of shape (10980, 10980, N) where K is the number of channels in the raster and N is the number of classes. Then Y is 1 at pixels where a farm of that class exists in the raster?
Also, about the time component, I do have multiple time-stamps. My current idea is to use the final, flattened layer of CNN (before softmax) as input features to another fully connected DNN with each time-stamp combined, and decomposed time features such as day of week and week of year etc as additional inputs.
As you can tell I’m just making things up as I go so really appreciate you input as a geospatial specialist.
I wouldn’t give the full 10980x10980 raster (input size is too large). Chip or crop your raster to 256x256 squares or so. Stack RGB+1 channels for each chip so the input dimensions would be (256,256,4) where the last channel is the binary mask of each farm to be classified. I believe this is what @mnpinto is suggesting.
The idea being that it’s useful context for the model to see the surrounding areas while “guiding” it to the farm via the 4th channel mask. Otherwise all that zero-padded space is wasted as input and there’s not enough information in the masked pixels of the farm alone to do a good job of classifying, especially at the resolution you’re working at.
Wow, never thought about it that way but it makes sense (still wrapping my head around it).
Can you perhaps point me in a direction where I can read up on similar approaches? Struggling to imagine for example how I would set it up to go from these squares back to predicting the class of original farms…