Unet -Segment Label Size

  1. I see challenge in building the labels as per output format which is image_Segment will just stack the labels.This is how i was doing but still i was getting same 1 c label,what change i need to do in here…

     class MultiClassSegList(SegmentationLabelList):
         def open(self, id_rles):
             image_id, rles = id_rles[0], id_rles[1:]
             shape = open_image(self.path/image_id).shape[-2:]       
             final_mask = torch.zeros((1, *shape))
             for k, rle in enumerate(rles):
                 if isinstance(rle, str):
                     mask = open_mask_rle(rle, shape).px.permute(0, 2, 1)
                     final_mask += (k + 1) * mask
             return ImageSegment(final_mask)
    
     def load_data(path, df):
         train_list = (SegmentationItemList
                       .from_df(df, path=path/"train_images")
                       .split_by_rand_pct(valid_pct=0.1,seed=42)
                       .label_from_df(cols=list(range(5)), label_cls=MultiClassSegList, classes=[0, 1, 2, 3, 4])
                       .add_test(testfolder.ls(), label=None,tfm_y=False)
                       .transform(get_transforms(flip_vert=True,p_affine=0.8,max_rotate=360), size=128,resize_method =ResizeMethod.SQUISH, tfm_y=True)
    

df= as below


what purpose cols is serving here?
2) Secondly below was version of dice i was using but i felt following issues
a) in case of multilabel mask for a given image will it work
b)input==i & targs==i may never be true if i >1 is my understanding correct ?

     def dice_x(input:Tensor, targs:Tensor, iou:bool=False, eps:float=1e-8)- 
                n,c = targs.shape[0], input.shape[1]
                input = input.argmax(dim=1).view(n,-1)
                targs = targs.view(n,-1)
                intersect,union = [],[]
                for i in range(1,c):
                    intersect.append(((input==i) & (targs==i)).sum(-1).float())
                    union.append(((input==i).sum(-1) + (targs==i).sum(-1)).float())
                intersect = torch.stack(intersect)
                union = torch.stack(union)
                if not iou: return ((2.0*intersect + eps) / (union+eps)).mean()
                else: return ((intersect + eps) / (union - intersect + eps)).mean()

You open is creating a single channel output through final_mask += (k + 1) * mask. You’d want to replace the for loop with something like:

masks = [open_mask_rle(rle,shape).px.permute(0,2,1)
         if isinstance(rle,str)
         else torch.zeros(1, *shape, dtype=torch.uint8)
         for rle in rles]
final_mask = torch.cat(masks)

That dice is for single-label prediction, the argmax on the channel dimension will select the single highest channel for each pixel and use that as a prediction. So that will produce Bx1xHxW predictions not BxCxHxW. You need to use something like the sigmoid and threshold used in the one I gave for multi-label. Something like pred = (torch.sigmoid(input) > threshold).float(). The sigmoid converts input to the range (0,1) then any channels above threshold are predicted as 1 and others 0. This can then be compared to the multi-channel targets.

1 Like

Thanks … yes i just noticed while discussing with you :slight_smile:
you must be seeing df… now…

  1. here how cols work…,with range(5) in label_from_df…,is it taking in all the column names from the corresponding indices ?
  2. when should pass div=True and when false

The cols parameter selects the columns of the dataframe to pass to the label function. So that is passing a list of [0,1,2,3,4] which selects all five columns in the dataframe. So the id_rles in open will get a list of values for all the columns for a single item (row in df).
The div option divides values by 255, this is used if your masks come from images with values of 0 and 255 rather than 0 and 1 (or more commonly for images to convert 8-bit values to float). As you’re using RLE encoded values the open_mask_rle should be using 0 and 1.

Thanks Tom…this is first time m using fai v2 for segment so some basic qs earlier I used was an old one during Airbus competition.
Currently m working on steel defect competition ,joined in lately after finishing aptos with 90 rank official a d 50 unofficial :slight_smile:
Are you also working on steel competition ,if yes would like to team up ? Till date I worked all solo on past competition.

For you and anyone else working on the steel comp, I would note it is not actually mutli-label. If you look at the masks there is no overlap between classes, so the basic fastai single-label/single-channel setup should work fine. Though you can of course also apply the multi-label approach to it.

what you mean by overlap…
There are images init which have more than one mask defined
Prediction are to be made into this format
Image1 mask1,Image1 mask2

Yes, but no single pixel belongs to multiple classes. So you can just use a single channel for model output/target with pixel values of [0,1,2,3,4] (0 being background i.e. no class). Then split the resulting single channel output into multiple masks with something like 'pred_masks = [(pred == i) for i in range(1,5)]` where pred is model output of shape Bx1xHxW, and the result will be 4 separate 1 channel masks with values [0,1].

okay…

  1. when in output label i assign 1,2,3,4 to pixels of 1,0 in masks…
  2. pred==i to be true ,we should do then the argmax as i did previously before making this comparision?

Yes, sorry, the output of your network will actually be BxCxHxW, you then do an argmax to get the Bx1xHxW. This is the approach the fastai segmentation stuff assumes. Be aware if using the interpretation stuff in fastai that it applies argmax for you in some places. So look out for this.
The code you posted was at least generally correct for that approach, and is a workable way to do it. Just be aware that some kernels use the multi-label approach and you can’t mix the two.

Yes…
In a single label approach

  1. Single H*W will have pixels from all the classes ?

  2. if one is true then do i have to multiply class codes to the binary pixels to differentiate codes ,since single image can belong to multiple classes with non overlapping pixels…
    so earlier this was the case ,

     class MultiClassSegList(SegmentationLabelList):
         def open(self, id_rles):
             image_id, rles = id_rles[0], id_rles[1:]
             shape = open_image(self.path/image_id).shape[-2:]       
             final_mask = torch.zeros((1, *shape))
             for k, rle in enumerate(rles):
                 if isinstance(rle, str):
                     mask = open_mask_rle(rle, shape).px.permute(0, 2, 1)
                     final_mask += (k + 1) * mask
             return ImageSegment(final_mask)
    

if the above is the right way for single label approach then if suppose there are two masks of size 2 by 2 one is [[1 0],[1 0]] and second masks
[[0 1],[0 1]] then above would generate the single mask [1 2] [1 2] ?

Yes, that’s correct. You then need to specify a specific background class if your classes aren’t exhaustive as in this case where there isn’t a specific class for every pixel. So, you have classes[‘BG’,‘1’,‘2’,‘3’,‘4’] (noting these are labels, the choice of numbers here doesn’t affect anything).
Fastai will then generate model with 5 channel output and use argmax to create a single channel prediction with values from [0,1,2,3,4] in line with the targets produced like that.

Thanks Tom… discussing with you on this helped me join the pieces of this jigsaw puzzle :slight_smile:

Another interesting thing in problem of segmentations

  1. Does random rotations at many angles could move the object of interest out of the view frame ,provided resize metho is squish
  2. What is difference between Rotations ,and dihedral/Horrizontals flips … I presume both are subset of rotations only say if rotation is 360…

Thanks! Your code helped me building a working multi-label image segmentation on top of fastai. The code is in this Kaggle kernel if you want to have a look at it. The MultiLabelSegmentationLabelList.open() method is derived from this example.

1 Like

Is this application of Dice coefficient intended for multi-label (i.e. classes 0 - n) in a single mask?

It handles multi-label masks as separate channels each valued 0/1 not a single channel. The sort of output that works with torch.nn.BCEWithLogitsLoss.

1 Like

Thank you - for a single mask with labels (0…4), the standard dice coefficient gives values greater than 1. Have you come across this? I am assuming it is only designed for binary classification perhaps.

By standard I presume you mean the one in fastai?
Looking at the code (which does say it’s for binary targets) it looks like it’s expecting a 2 channel input, so something of shape (B,2,...) with B being batch. It takes an argmax of dim 1, which would work with any number of labels, but then it’s using intersect = (input * targs) and union = (input+targs) which will only work if the values are 0/1 with other values producing intersect and union of >1.
Thinking about it dice is only really defined properly for the binary case. What is union supposed to mean if values are multi-class? But you can of course treat multiple labels as multiple binary classification problems and then calculate the mean which is how dice seems to be defined in such cases. So you’d need to convert your inputs/targets to a sequence of binary values for each class (i.e. (B,C,...) tensors).

Thank you for the reply. The Severstal Kaggle explained in this thread uses dice for it’s evaluation, hence I am trying to understand it. You’ve shared some important points… I’ll do a bit more homework to understand the code a bit more.

Note that in Severstal they actually treat each class separately, i.e. given the four classes for each input image, there are 4 rows in the training CSV and you generate 4 separate predictions. So the dice is only defined for a single image/class pair.
This doesn’t mean you have to generate predictions like that. If you use a single network for all classes then you’re better creating all 4 predictions at once, but conceptually it is 4 different binary predictions.