Image segmentation: leaving some pixels unlabeled?

Short version: Is it possible to leave some pixels unlabeled when training a segmentation model? If so, how would I specify “unlabeled” when creating the mask?

Long version: I am hoping to train a segmentation model to classify vegetation cover types in aerial imagery. I will need to create my own labeled masks to train the model, and I am thinking about ways to do this quickly and efficiently. One option would be to only label some of the pixels in the image, such as the ones that are easiest to draw polygons around, or even just a few “representative examples” as opposed to every single pixel. Here is an example of how I would like to be able to label an image:

Is there a way to assign pixels a “no data” value so they are not considered in the fitting of the model? I don’t think it would work to create a new class called “unclassified”, as this would cause the model to get confused between the labeled features and the unlabeled features that are actually of the same class as some of the labeled features.

Thanks in advance for any help!

When you do segmentation you will usually do pixel-wise MSE or something as a loss function.
If you generate masks with this “Unlabeled” class you can adjust your loss functions as follows:
Assuming you have your predictions p (prediction) and segmentation mask t (target) you can find
t != UNLABELED (whatever value this unlabeled class has) which is a tensor containing booleans.
You can then either index your p and t tensors using this resulting tensor as a mask for both tensors or just do

hasLabel = (t != UNLABELED).float()
loss = mse(p * hasLabel, t * hasLabel)

which will basically replace every pixel in your prediction and label with 0, leading to no loss contribution. Using indexes is probably faster, but you will have to look for the appropriate function in the pytorch documentation.

Note that you do NOT want to remove stuff from your input because of that, because your convolutions will need to take the unlabeled parts into consideration.


Thanks very much! This makes a lot of sense. I’ll try this out and will report back here.

I’m working on implementing this solution (and learning fastai, PyTorch, and Python concurrently :).

I see that the default loss function for segmentation is (flattened) CrossEntropyLoss(). This function relies on torch.nn.functional.cross_entropy() to do the actual math. It takes an input (predictions) and target (segmentation mask). It returns a single value. There are several things I don’t understand:

  1. Given the loss function outputs a single value, how can I apply the suggestion from @xeTaiz (above) to assign zero loss to the pixels in the “unlabeled” class?

  2. The “input” (predictions) tensor passed to cross_entropy() is 345600 rows X 32 columns. Thus there is one row for each pixel in my image batch, and one column for each class in the segmentation mask. The values in this tensor are mostly between -2 and 2. What do the values represent? The fitted log-likelihood of each pixel belonging to each class? The “target” (classification mask) tensor is a 1-dimensional tensor of 345600 values (the mask classes for each pixel).

Thanks again!

On 1.
You would replace the flattened cross entropy with your custom loss function, in which you basically call the flattened cross entropy loss, however you modify your tensors before passing them as explained above. You basically have to step in at the point before it is flattened and apply your masks there. (Or actually you could also flatten your mask, doesn’t really matter as long as the same values get multiplied)
The whole reason for the loss function is to produce a single number saying how good or bad your prediction is. At this point all pixels that contribute to your training are already incorporated. It it right before that step where you want to apply the above.

On 2.
There is not necessarily a good way to interpret those exact values. It’s whatever your last layer produced. It’s some activations. If the next step is a cross entropy, they make sense in the way that low values will steer towards low class probability and high values steer towards high class probability. The specific range -2 to 2 does not have a specific meaning. It’s more like the relation to the other values, as the highest of those values for a pixel will kind of determine which class is predicted for that pixel. The exact range depends on your exact weights, but whether it’s -20 to 100 or -1 to 1 does not change the prediction.

Note that despite this being the way to achieve what you were looking for, throwing away pixels at this stage might not necessarily be a good idea

Thanks again for the super helpful answers. Now I understand the logic of modifying the tensors prior to passing to the loss function. As a sidenote, I discovered that the CrossEntropyLoss function has an optional parameter weights which seems to have the same effect (I can set the weight to 0 for the unlabeled class and 1 for all other classes). I tested this with the CamVid course example and got some encouraging results! I only used the building, car, tree, pedestrian, and cyclist classes (weight=1) and left all other classes unlabeled (weight=0).


I agree that throwing away pixels is not ideal (this might be why the model gets confused between cyclists and pedestrians?), but I think this is realistically the only way I can approach this problem right now. Digging through the forums, I gather that an alternative might be to do an unsupervised segmentation first, then give the resulting classes names and possibly train more (against my incomplete mask). That may be a longer-term goal. First I will see how well this incomplete-masking approach works with my vegetation data.

Great to see other folks on this forum tackling ecological datasets. I’m working at a smaller scale than you are, but also trying to classify vegetation, or rather get percent cover estimates from imagery. Image segmentation seems like a great approach, but the cost to label training data pixels seems very high. I wonder if we might be able to exploit the heat map feature in the plot_top_losses function. The pipeline I am envisioning is to take satellite/fixed-wing imagery:

  1. Tile the imagery into many small sub-units
  2. Label each tile by taxonomoic group, or, in your case veg class, as in the dog breeds example (eg, not pixel painting)
  3. Extract heat map class probability values from the plot_top_losses function
  4. Establish a probability cutoff that matches the bounds of veg type
  5. Create a segmentation mask from those pixels above the cutoff

That’s a great idea @mossCoder! I’m interested in trying that and comparing the results against a coarse/incomplete pixel-painting approach. I notice that the heatmap can highlight only some parts of the focal object (e.g., a dog’s head) and essentially ignore others (e.g., the legs and torso), which would not be desirable for segmentation training. But maybe this is less of an issue with veg imagery. Another challenge (for my application at least) will be to get image tiles that don’t contain all (or most) candidate veg classes – the tiles might have to be very small, which could decrease the advantage of this approach over pixel painting. Super excited to see how it works out either way. I’ll be out for a week but am excited to get back to this when I return.

Image source

I’m happy to report that the coarse pixel-painting approach seems to be working quite well with my vegetation imagery. Here are some example results (output of show_results()). The dark blue is my “Unlabeled” class, to which I assigned a weight of 0 in the loss function. Green is tree, white is bare ground, yellow is shrub, and purple is seedling (this is the most confused class, also the rarest).

Overall pixel-wise classification accuracy (excluding the Unlabeled class) was 93% after one cycle of 20 epochs. For this initial trial, I used only 10 training images and 10 validation images.


Hi @djyoung , thank you for these posts. What was the tool used for annotation ?

Hi–this is geospatial data (the vegetation imagery started out as a GeoTIFF raster), so I created the annotations as polygon shapefiles in QGIS. I then used the raster, sf, SpaDES and png packages in R (which I’m more comfortable with than Python) to:

  1. Recode my annotations (vegetation classes) as numbers (1 to 5)
  2. Rasterize the annotation shapefile (polygons) to the same grid as the imagery raster (assigning the value 0 to any pixels that did not fall under any annotation polygons)
  3. Split both rasters (annotations and imagery) into multiple small images (512 x 576 pixels each)
  4. Write the images as .png
1 Like

@djyoung, how did you adjust your weights on the loss function? I’m doing a similar segmentation categorization on aerials and I’m losing my most critical class (floating woody debris in my case) because it’s a tiny portion of images.

When I assign loss_func=CrossEntropyFlat() to my unet_learner I start getting errors about input not matching target size; Expected input batch_size (9216) to match target batch_size (393216). That’s 256x6x6 and 256x256x6, and my tiles are 256x256 with 6 classes, but if this is the default loss for the segmentation learner then just assigning the default shouldn’t break anything…

@mr_fnord, did you include axis=1 in your loss function? I was tipped off to that by this post. I was getting the same error as you before I did that.

1 Like

That was it. One more error about CUDA vs CPU backend and I came up with this, and it seems to be working. Now I’ve just got to refine my weights to make the segmenter find my special classes.

#weights = torch.ones(data.c).float().cuda()
learn = unet_learner(data, models.resnet34, metrics=metrics, 
1 Like

@djyoung , once the prediction in completed and you have the probability mask with various color segments. How to mark the inference output with classes ?. Eg : here suppose you want to write / print your class names over various identified segments how to do that ?

pred_class,pred_idx,outputs = learn.predict

How to map the class names to the probability map ? .

I have not yet reached the stage of outputting the segmented image for use outside Python. However, the IDs of the class predictions are the same as the IDs you use in the training and validation masks. Sorry I can’t be more helpful at this stage.

Hi, I am doing image segmentation to be able to detect floors in a perspective image of a building. I have 512x512 .jpg images and their corresponding masks as .png. I have 17 classes for each number of floor including the background. I generated my masks from the Terminal using Labelme ( and got .json files with their corresponding .png

After reducing batch_size, setting CrossEntropyFlat(axis=1) and many other steps, I still get the error

Expected input batch_size (9216) to match target batch_size (393216)

It is a problem with the np.array when reading tensor weights because I am only getting this:

src_size = np.array(mask.shape[1:])


(array([512, 512]), tensor([[[0, 0, 0,  ..., 0, 0, 0],
      [0, 0, 0,  ..., 0, 0, 0],
      [0, 0, 0,  ..., 0, 0, 0],
      [0, 0, 0,  ..., 0, 0, 0],
      [0, 0, 0,  ..., 0, 0, 0],
      [0, 0, 0,  ..., 0, 0, 0]]]))

, when in fact I should be getting an array of numbers from 1 to 17.
Anyone had to deal with Labelme to Fastai before? Would highly appreciate the help.

I am working on the classification of WV2 imagery, and I split my image into 1000 images for segmentation. After prediction, I have to merge all the tiles and see the final results.
Is there any way to match the predicted images to their file name in the test set? I am using V2 for image segmentation, and I used this for prediction:

data = DataBlock(blocks=(ImageBlock, MaskBlock(codes)),item_tfms=Resize(64), get_items=get_image_files, splitter=FileSplitter(’/content/drive/MyDrive/classification/CNN_segmentation/big_Image/valid_PNG_big.txt’),get_y=get_msk,batch_tfms=[*aug_transforms(size=half), Normalize.from_stats(*imagenet_stats)])
dls = data.dataloaders(path_im, bs=64)
dl = learn.dls.test_dl(fnames[:], shuffle=False)
preds= learn.get_preds(dl=dl)

I am trying to name the predicted images by their file name in test set.

HI Everyone, it’s been a few years, and I wonder if anyone in the aerial imagery realm has any updates to share?

On my end, I’ve developed a set of tools to annotate spatial data, which takes the form of a preprocessing pipeline and R Shiny App called paint2train.

I’ve been using these tools to label problem weed species in fixed-wing imagery of grassland ecosystems.

I’m now exploring what data augmentation methods are most important for model training using Fastai and Optuna and have some interesting findings:

In my case, the probability of affine transformations proved quite important, where low-none values performed (Dice coefficient) best. My system can have complex terrain and shadows may play an important role that maybe has a part the explanation here? Also interesting is the importance of zoom min/max.