Image segmentation: leaving some pixels unlabeled?


Short version: Is it possible to leave some pixels unlabeled when training a segmentation model? If so, how would I specify “unlabeled” when creating the mask?

Long version: I am hoping to train a segmentation model to classify vegetation cover types in aerial imagery. I will need to create my own labeled masks to train the model, and I am thinking about ways to do this quickly and efficiently. One option would be to only label some of the pixels in the image, such as the ones that are easiest to draw polygons around, or even just a few “representative examples” as opposed to every single pixel. Here is an example of how I would like to be able to label an image:

Is there a way to assign pixels a “no data” value so they are not considered in the fitting of the model? I don’t think it would work to create a new class called “unclassified”, as this would cause the model to get confused between the labeled features and the unlabeled features that are actually of the same class as some of the labeled features.

Thanks in advance for any help!

(Dominik Engel) #2

When you do segmentation you will usually do pixel-wise MSE or something as a loss function.
If you generate masks with this “Unlabeled” class you can adjust your loss functions as follows:
Assuming you have your predictions p (prediction) and segmentation mask t (target) you can find
t != UNLABELED (whatever value this unlabeled class has) which is a tensor containing booleans.
You can then either index your p and t tensors using this resulting tensor as a mask for both tensors or just do

hasLabel = (t != UNLABELED).float()
loss = mse(p * hasLabel, t * hasLabel)

which will basically replace every pixel in your prediction and label with 0, leading to no loss contribution. Using indexes is probably faster, but you will have to look for the appropriate function in the pytorch documentation.

Note that you do NOT want to remove stuff from your input because of that, because your convolutions will need to take the unlabeled parts into consideration.


Thanks very much! This makes a lot of sense. I’ll try this out and will report back here.


I’m working on implementing this solution (and learning fastai, PyTorch, and Python concurrently :).

I see that the default loss function for segmentation is (flattened) CrossEntropyLoss(). This function relies on torch.nn.functional.cross_entropy() to do the actual math. It takes an input (predictions) and target (segmentation mask). It returns a single value. There are several things I don’t understand:

  1. Given the loss function outputs a single value, how can I apply the suggestion from @xeTaiz (above) to assign zero loss to the pixels in the “unlabeled” class?

  2. The “input” (predictions) tensor passed to cross_entropy() is 345600 rows X 32 columns. Thus there is one row for each pixel in my image batch, and one column for each class in the segmentation mask. The values in this tensor are mostly between -2 and 2. What do the values represent? The fitted log-likelihood of each pixel belonging to each class? The “target” (classification mask) tensor is a 1-dimensional tensor of 345600 values (the mask classes for each pixel).

Thanks again!

(Dominik Engel) #5

On 1.
You would replace the flattened cross entropy with your custom loss function, in which you basically call the flattened cross entropy loss, however you modify your tensors before passing them as explained above. You basically have to step in at the point before it is flattened and apply your masks there. (Or actually you could also flatten your mask, doesn’t really matter as long as the same values get multiplied)
The whole reason for the loss function is to produce a single number saying how good or bad your prediction is. At this point all pixels that contribute to your training are already incorporated. It it right before that step where you want to apply the above.

On 2.
There is not necessarily a good way to interpret those exact values. It’s whatever your last layer produced. It’s some activations. If the next step is a cross entropy, they make sense in the way that low values will steer towards low class probability and high values steer towards high class probability. The specific range -2 to 2 does not have a specific meaning. It’s more like the relation to the other values, as the highest of those values for a pixel will kind of determine which class is predicted for that pixel. The exact range depends on your exact weights, but whether it’s -20 to 100 or -1 to 1 does not change the prediction.

Note that despite this being the way to achieve what you were looking for, throwing away pixels at this stage might not necessarily be a good idea


Thanks again for the super helpful answers. Now I understand the logic of modifying the tensors prior to passing to the loss function. As a sidenote, I discovered that the CrossEntropyLoss function has an optional parameter weights which seems to have the same effect (I can set the weight to 0 for the unlabeled class and 1 for all other classes). I tested this with the CamVid course example and got some encouraging results! I only used the building, car, tree, pedestrian, and cyclist classes (weight=1) and left all other classes unlabeled (weight=0).


I agree that throwing away pixels is not ideal (this might be why the model gets confused between cyclists and pedestrians?), but I think this is realistically the only way I can approach this problem right now. Digging through the forums, I gather that an alternative might be to do an unsupervised segmentation first, then give the resulting classes names and possibly train more (against my incomplete mask). That may be a longer-term goal. First I will see how well this incomplete-masking approach works with my vegetation data.