How to load multiple classes of RLE-strings from CSV? Severstal Steel Competition

I am wondering what the recommended way to ingest the dataset provided in the recent Kaggle competition, is.

The provided csv contains [imgname].jpg_[class] in the first column and an RLE string (or NaN) in the second.

So for each image, I have:
[imgname].jpg_1 - RLE String
[imgname].jpg_2 - NaN
[imgname].jpg_3 - NaN
[imgname].jpg_4 - NaN
or similar.

I recognize that fastai provides a open_mask_rle function, however the discussions I can find are outdated for the current DataBunch api and it is unclear to me how to construct my databunch.

Being new to python and ML, I agree with Jeremy’s statement in lesson 3 that figuring out how to actually get the data in is the most difficult part for me.

Are any others here looking at this competition? How are you planning to work with this data?

My current attempt looks like this:

isna = df_train.isna()
def get_y_fn(x):
 idxs = df_train.index[df_train['ImageId_ClassId'].str.contains(str( == True]
 masks = []
 for i, v in enumerate(idxs):
     cell = df_train.iloc[v][1]
     if isna.iloc[v][1] == False:
         mask = open_mask_rle(df_train.iloc[v][1], size)[!=(i+1)] = i+1
         mask = ImageSegment(torch.zeros(1, 1600, 256))
 return masks

codes = ['0', '1', '2', '3']

class SegLabelList(SegmentationLabelList):
 def open(self, fn): return open_mask_rle(fn)

class SegItemList(SegmentationItemList):
 _label_cls,_square_show_res = SegLabelList,False

data = (SegmentationItemList.from_folder(path/'train_images')
     .label_from_func(get_y_fn, classes=codes)
     .transform(get_transforms(), size=size, tfm_y=True)

Which returns an error AttributeError: 'list' object has no attribute 'read' (I assume I can’t return this “masks” list).

I am thinking I could combine these separate masks, which contain zeros and [class_num] as values, into one mask. Is this a good way to go? How would I do that? I also do not know if this dataset has overlapping masks.


Here is how I did it:

# change csv so that it has image_id on one column and rles in the 4 others
def change_csv(old, new):
    df = pd.read_csv(old)

    def group_func(df, i):
        reg = re.compile(r'(.+)_\d$')

    group = df.groupby(lambda i: group_func(df, i))

    df = group.agg({'EncodedPixels': lambda x: list(x)})

    df['ImageId'] = df.index
    df = df.reset_index(drop=True)

    df[[f'EncodedPixels_{k}' for k in range(1, 5)]] = pd.DataFrame(df['EncodedPixels'].values.tolist())
    df = df.drop(columns='EncodedPixels')
    df = df.fillna(value=' ')
    df.to_csv(new, index=False)
    return df

class MultiClassSegList(SegmentationLabelList):
    def open(self, id_rles):
        image_id, rles = id_rles[0], id_rles[1:]
        shape = open_image(self.path/image_id).shape[-2:]       
        final_mask = torch.zeros((1, *shape))
        for k, rle in enumerate(rles):
            if isinstance(rle, str):
                mask = open_mask_rle(rle, shape).px.permute(0, 2, 1)
                final_mask += (k+1)*mask
        return ImageSegment(final_mask)

def load_data(path, csv, bs=32, size=(128, 800)):
    train_list = (SegmentationItemList.
                  from_csv(path, csv).
                  label_from_df(cols=list(range(5)), label_cls=MultiClassSegList, classes=[0, 1, 2, 3, 4]).
                  transform(size=size, tfm_y=True).
                  databunch(bs=bs, num_workers=0).
    return train_list

Basically what you do should work if you sum or aggregate your masks so that you return a single-channel mask that has values between 1 and 4.


Thank you for your help. I was not able to get my version to work (errors down the line) but yours works well.

I have run into my next common problem of being confused by loss functions.

My model is outputting [b, 1, h, w] but my ground truths are of [b, 5, h, w]

If I do not provide a loss function, the model runs and learns badly. The loss_func yields FlattenedLoss of CrossEntropyLoss().

If I manually set my loss_func to CrossEntropyFlat(), I get an error that my tensors are the wrong shape.

I feel like I am still misunderstanding what is happening under the hood. How can the model can run by default despite predictions and targets being different shapes, but I can’t successfully get any other loss function working?

Mmmh that should actually be the contrary : your model needs to output 5 channels, while your ground truth only has one. CrossEntropyLossFlat then takes the argmax in dimension 1 as new value for the corresponding pixel (which is what we want: if it is 0 it is background, if it is 1 it is class 1, etc.) and computes a flattened version of cross entropy. That works well for me, and I didn’t need to specify the loss anywhere (even though it is strange that it doesn’t work if you pass it manually).

You’re right, that’s my mistake- I swapped them around.

I figured out setting it manually, I have to specify axis=1. It seems fastai determines the correct axis somewhere under the hood if I do not set it, which is at odds with the documentation. That’d be the source of my confusion.

Fastai specifies it here:

class SegmentationLabelList(ImageList):
    "`ItemList` for segmentation masks."
    def __init__(self, items:Iterator, classes:Collection=None, **kwargs):
        super().__init__(items, **kwargs)
        self.classes,self.loss_func = classes,CrossEntropyFlat(axis=1)

    def open(self, fn): return open_mask(fn)
    def analyze_pred(self, pred, thresh:float=0.5): return pred.argmax(dim=0)[None]
    def reconstruct(self, t:Tensor): return ImageSegment(t)

That could actually be a decent improvement to set the axis of CrossEntropyFlat to 1 by default, as pytorch always puts the channels in dimension 1. The only problem is that -1 is far more general and will almost never raise an error.
So yeah, I didn’t notice it either but axis must basically always be set as it will almost never be -1, which is default.

1 Like

I’m just now tackling this same problem and ran across this thread. I’m curious about the choice to combine the masks in the MultiClassSegList . Is it possible to combine the masks first before creating a databunch for it? Also, I hadn’t even though of passing something to the label_cls to do it. Did Jeremy mention that in a course some where?

You indeed have the option to save the combined masks directly and use fastai’s default version of SegmentationLabelList, that would work just as well (it will even be faster in the end). Passing something to label_cls allows me to specify that the dataset should expect the labels to be in a list of type MultiClassSegList. When using the default implementation, you don’t need to specify this as it is already in the source code (for instance, the default label_cls for SegmentationItemList is SegmentationLabelList). So yeah, you basically don’t need any of this if you save the full masks somewhere.

1 Like

Thanks for the response! Glad to know I can just combine all the masks together from the start and then use the default. I’m mostly curious because there isn’t much info on how to handle multiple masks at once and I didn’t want to completely rip of the code you posted above to do a Kaggle competition, so I have been trying to go through and figure out why everything works the way it does and then see if I can’t refactor it in my own way.

Your explanation just cleared up the concept behind your MultiClassSegList for me. Much appreciated!

1 Like

Do you have an example of this approach you can share?

@Florobax are you able to explain this block of code for me please:

for k, rle in enumerate(rles):
if isinstance(rle, str):
mask = open_mask_rle(rle, shape).px.permute(0, 2, 1)
final_mask += (k+1)*mask

Thank you

Sure! I created a csv that associate with 5 columns: one for image id and 4 for the rle masks corresponding to each class. When trying to access a specific item, the whole line corresponding to the desired item is passed, from which I can then take the list of rles. I loop through them, using enumerate to count the steps, and for each one, if it is a string as expected (I am not sure why I put this step, maybe I encountered some None or int values at this point):

  • I open the mask using fastai’s open_mask_rle, which gives me an object of type ImageSegment
  • I extract the corresponding tensor using the property px
  • I swap the 2 spatial dimensions as the tensor given by open_mask_rle is transposed by fastai
  • Finally, I add the corresponding mask to the total mask, giving its pixels the value k+1 (which is 1 for the first mask, 2 for the second, etc.)
    In the end you get a mask with values between 0 and 4 (included), 0 being the background and 1-4 corresponding to a class each.

Thank you, this is very helpful.

1 Like

What I don’t understant with this method is that when you get mask overlap a pixel with value 5 can mean class 1+4 or 3+2.
How do you decode you masks after segmentation?

Using squared indexes (1, 4, 9, 16) seems more convenient because you do not get overlap in sums of the combinations:
() sum= 0
(1,) sum= 1
(4,) sum= 4
(9,) sum= 9
(16,) sum= 16
(1, 4) sum= 5
(1, 9) sum= 10
(1, 16) sum= 17
(4, 9) sum= 13
(4, 16) sum= 20
(9, 16) sum= 25
(1, 4, 9) sum= 14
(1, 4, 16) sum= 21
(1, 9, 16) sum= 26
(4, 9, 16) sum= 29
(1, 4, 9, 16) sum= 30

You can’t use this method for overlapping masks. As you correctly note you couldn’t distinguish them.
Though note that masks aren’t output by the network in this format. The network gives one output per class and uses argmax to choose the highest prediction as the output value, again not supporting overlapping masks.
There isn’t inbuilt support for overlapping masks in fastai (that I’ve seen). But it can be fairly easily added, there’s a couple of recent threads on this method (including code I wrote for this) if you search for segmentation. In the competition referred to in the thread the masks are all non-overlapping so either method works.

That would work, though a more common method is the pretty standard binary encoding, using successive powers of 2, so 1, 2, 4, 8, 16. etc. This maps to standard bitwise operations as each class is a different bit in the binary representation.
Also note your method only works for <4 classes, as 5**2 = 25 which is also 16 + 9.


Thanks guys for the great tips especially @florobax . I have another question. Do you know how to export the data as the format that they require? It seems that they require a CSV file with similar format as train.csv file but different in encoded pixels. More info can be found here:

It seems to me that it is exactly like in train.csv, though I did not look into the details. You can use fastai’s rle_encode, you might just need to transpose the image first as some competitions count columns first while some count lines first.

I got an error None doesn’t have an attribute group. I was using it on Kaggle’s Clouds challenge

There are probably NaN values in your csv, not sure though. I made this for severstal so might no work without any change for the cloud competition.

Thank you for this great thread,
I am dealing with the same issue; but I have 1 mask for each image which contains labels from 0 to 3 ; I want to make a mask for each image with the 4 channels; but my images has different height and width. I don’t have a fixed source image size. Do you have any advice?