Create databunch with multiple segmentation mask as label

I have problem creating label with the datablock API.

(SegmentationItemList.from_folder(path) \
 .split_by_rand_pct())

I can read my image properly but I do not know how to create the label accordingly
I have 3 segmentation masks as label (can overlap). i.e. my label is (3, X, Y) segmentation mask. I have them storing in a dataframe

Name rle_encoding class
001.jpg 1 1 a
001.jpg 1 1 b
001.jpg 1 1 c

I have checked the camvid notebook but it does not apply to multi-segmentation mask.
Any help will be appreciated, thx!

4 Likes

really need some help here… I thought I should subclass SegmentationItemList or SegmentationLabelList but I am not sure how should I do this.

1 Like

Have you seen this?
Sounds interesting, let us know how you get on.

Yes I have seen this, but I stuck at using label_from_func, i tried to define custom open method but it does not work.

Hello @nok

The label_from_func is a function that maps from the data filename to the label filename. You have to store your data and labels so everything matches up (i.e. for the function below if your data is in folder /media/cat_images your labels should be in /media/cat_labels).

Hope that helps!

def get_y_fn(x):
    
    y = str(x.absolute()).replace('images','labels')
    y = Path(y)
    
    return y
2 Likes

I thought the open() method would be called by default, so I have to define a custom function? I have 5000 images and loading them all into memory cause another problem.

I have a dataframe, which store the label as rle encoding ( see the top comment).

Therefore I subclass the SegmentationLabelList and simply override the open() method which will read the dataframe and decode the label accordingly

Still stucked here… I struggle to debug with this. When I try to step in and call ds it returns error, but ds.classes simply return empty

I got past this stage with the following code:

get_y_fn_fish = lambda x: train['EncodedPixels'].loc[train['Image_Label'].str.contains(x.as_posix().split('/')[-1]+"*",regex=True)].values[0] if train['EncodedPixels'].loc[train['Image_Label'].str.contains(x.as_posix().split('/')[-1]+"*",regex=True)].values[0] == train['EncodedPixels'].loc[train['Image_Label'].str.contains(x.as_posix().split('/')[-1]+"*",regex=True)].values[0] else ''

so that function returns the rle string or ‘’ corresponding to an image. After that, if you want to open the mask you can do (img_f is a PosixPath object):

mask_fish = open_mask_rle(get_y_fn_fish(img_f), shape=(1400, 2100)).resize((1,128,128))
mask_fish

I then proceeded to convert the rle mask into an image and saved them:

for image in path_img.iterdir():
    mask_fish = open_mask_rle(get_y_fn_fish(image), shape=(2100, 1400))
    mask_fish.save(path_img/'..'/'fish_masks'/f'{image.stem}_fish_mask.png')

After that, we can create a function that maps an actual image to its image mask:

get_y_fn_f = lambda x: path_img/'..'/'fish_masks'/f'{x.stem}_fish_mask.png'

we need another modification to make this work with masks with values 0 and 1:

and voila:

   src = (SegItemListCustom.from_folder(path_img)
       .split_by_rand_pct(0.2)
       .label_from_func(get_y_fn_f,classes=[0,1]))

data = (src.transform(tfms, size=128)
        .databunch().normalize(imagenet_stats))
1 Like

nok, could you provide more details as to your implementation, or a link to the notebook itself?

It’s hard to debug what’s going on without access to SegmentationMultiLabelList and open_mask.

I assume you saw florobax’s implementation here which I think is exactly what you’re trying to do as well?

Hey so I can get this to work but now I’m having RAM problems. My images are 531x531 and I’ve cut down the training dataset alot (from over 100 to 12 images). I can get all the way to the end and then it says I don’t have enough RAM:

RuntimeError: [enforce fail at …\c10\core\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 217362432 bytes. Buy new RAM!

I can’t really buy more RAM (this is a work computer), does anyone have suggestions?

(Task Manager reports usage by other programs as very low and there really isn’t much on this computer since it’s for work and relatively new. I’m inclined to believe something weird is happening in my code)

The size of your training set should not be a problem in theory, as the dataset should only store paths to your images. What you can work on when you lack RAM is smaller batch sizes or resizing your images to smaller sizes (256x256 for instance).
What’s strange is that your error message comes from trying to allocate around 200MB of RAM, which is not that much. How much RAM do you have ? Are you using image paths in your dataset or are you directly loading all images in memory ?

Last thing, doing computer vision on CPU is going to be very very long, GPU are almost necessary for most tasks.

I knocked my batch size all the way down to 2 and now I get this weird error:
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at C:\w\1\s\windows\pytorch\aten\src\THNN/generic/ClassNLLCriterion.c:94

The computer has 8 GB of RAM and the other processes I use on the computer don’t take up that much space. I’m relatively new to this but I believe what I am doing is using image paths since I have code like this:
path_images = Path("/Users/xxx/Desktop/inout/train")
path_lbl = path_images

I am aware that it’s going to take forever, honestly the plan was to just let it sit over night and go to sleep. Mostly I just want to see if I am even training the correct things and then talk to my boss about beefier processing.
I was using Google Colab for a while but experiencing difficulty with uploading all my data and then accessing it with paths.

This error often comes from wrong values in your target mask/category. You should have values between 0 (background) and n_classes-1 included in either case. In terms of shape, it is expected to be B \times 1 \times H \times W (B being the batch size) for segmentation and B \times 1 (or just B I am not sure) for classification. Either way, the overall problem is difficult to assess without any code to look at.

For those looking for an example of multi-label image segmentation, I created an implementation in this kernel.

For each image, the RLE-encoded masks are decoded and stored in different channels of MultiLabelImageSegment class.

You can start by looking at this line in the notebook and follow its parameters to see how it works:

item_list = item_list.label_from_func(func=get_masks_rle, label_cls=MultiLabelSegmentationLabelList, 
                                      classes=classes, src_img_size=train_img_dims)
1 Like

That’s really helpful!
As far as I’m aware tho my mask is 0 for the background and 1 for the image itself.
If you can, I would really appreciate it if you could look over the picture of my code that I have attached (sorry there’s some code debris).
No one I know has been able to help so I am really thankful for your responses.

Does any one have any insight into why this isn’t working? I know I don’t exactly have a lambda fn but otherwise everything is basically the same as the tutorials I see but for some reason this isn’t working.
I would really appreciate help since this is for a job and I’m leaving soon

Based on the error message, there is a dimension mismatch. The way I usually debug this kind of issues is by using %debug in the notebook to fire the interactive debugger at the exception location to get a better understanding of the issue. See the Python debugger cheatsheet for commands. Good luck!

1 Like

I really don’t know how to use %debug even with the cheatsheet.
From what I can tell one set of images has 3 channels and the other has 1. would that affect the training?

YoungProgrammer, I think I had a similar issue using the default accuracy metric while doing a multi-class segmentation problem.

At a high level, my issue is that the default implementation of accuracy takes the argmax across the wrong dimension. So I implemented a custom accuracy metric that slightly tweaks the default behavior.

def cust_accuracy(input:Tensor, targs:Tensor)->Rank0Tensor:
    "Computes accuracy with `targs` when `input` is bs * n_classes."
    n = targs.shape[0]
    input = input.argmax(dim=1).view(n,-1)
    targs = targs.view(n,-1)
    return (input==targs).float().mean()

Notebook for course 1 lesson 3 implements a custom metric that does a similar thing.

Then I passed this custom metric when initializing my learner
learn = unet_learner(test_databunch, models.resnet34, metrics=[cust_accuracy], wd=wd)

Should fix your problem. Hope this helps and isn’t completely off base!

Explanation of what happened for me

When evaluating the model, your metric takes your prediction (the “input” to the accuracy metric) and compares it to target.

Out of multiclass segmentation, your output is going to be of shape [batch size, # classes, height, width]

Importantly, the second dimension contains your class-level probability estimate. Therefore, it takes the highest predicted probability across all your classes–this is what input.argmax(dim=1) does.

However, the metric accuracy() implemented in the source code (link) is actually taking input.argmax(dim=-1) which is just giving the highest probability for a given row of your image.

This is nonsensical for our purposes. I’m not totally sure why it’s designed this way. I think it’s because for other classification types where you use accuracy, input is of format [batch size, classification]?

3 Likes