Unet_binary segmentation

I have same problem and ran it with cpu based pytorch and got this

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes’ failed. at /opt/conda/conda-bld/pytorch-nightly-cpu_1544170178111/work/aten/src/THNN/generic/ClassNLLCriterion.c:93

Now what?

Sorry if this is a noob question but how do you implement this change? Do you go in the source code and update the library or just redefine the class in your own notebook?

You can just add the code block and run the cell in your own notebook. See the notebook link in my post for an example of this.

I am trying to implement the unet paper, but when concatenating the features from the contracting path the upsampled features, I noticed that in the case of the example in the paper, the features from the encoder are 64x64 and the upsampled features are 56x56.
My question is how do concatenate them, do you pad the upsampled features to 64x64 or do you crop the features from the encoder to 56x56.

I think you are using ‘valid’ padding in conv layers, instead use ‘same’ padding

This was such a helpful post. Thank you! I had to change the code a little to get it to work, because I think ImageItemList has been removed. Documenting it here for anyone else who finds this thread:

class SegLabelListCustom(SegmentationLabelList):
    def open(self, fn): return open_mask(fn, div=True)

class SegItemListCustom(SegmentationItemList):
    _label_cls = SegLabelListCustom
2 Likes

Hi,

I have RGB label-images with 6 labels.

codes = array([‘Impervious_surfaces’,‘Building’,‘Low_vegetation’,‘Tree’,‘Car’,‘Clutter_background’])

below is hoe they are encoded in RGB channel.

colors_LU ={‘Impervious_surfaces’: array([255, 255, 255]),
‘Building’: array([ 0, 0, 255]),
‘Low_vegetation’: array([0, 255, 255]),
‘Tree’: array([0, 255, 0]),
‘Car’: array([ 255, 255, 0]),
‘Clutter_background’: array([ 255, 0, 0])}

using
mask = open_mask(get_y_fn(img_f))
mask.show(figsize=(5,5), alpha =1)

I get…mask.data

tensor([[[255, 255, 255, …, 255, 255, 255],
[255, 255, 255, …, 255, 255, 255],
[255, 255, 255, …, 255, 255, 255],
…,
[ 29, 29, 29, …, 255, 255, 255],
[ 29, 29, 29, …, 255, 255, 255],
[ 29, 29, 29, …, 255, 255, 255]]]))

So not for 0 to 5.
By examining the I figured that there are indeed 6 classes inside, which seem to be gray representations of RGB colors for each class.

{‘Impervious_surfaces’: 255,
‘Building’: 29,
‘Low_vegetation’: 178,
‘Tree’: 149,
‘Car’: 225,
‘Clutter_background’: 76}

I am using following to create the data source

src = (SegmentationItemList.from_folder(path)
.split_by_folder(train = ‘train’, valid = ‘valid’)
.label_from_func(get_y_fn, classes=codes))

data = (src.transform(get_transforms(), tfm_y=True, size = size)
.databunch(bs=bs)
.normalize(imagenet_stats))

This creates a problem where numbers in the mask are > number of classes, so I was getting that CUDA error.

I got around that by creating dummy classes (0-255), but then I start running out of memory, even with image size reduced to 200.

I do not know how to solve this problem. Dividing by 255 does not help me, since I have > 2 classes.

Any help would be much appreciated.

1 Like

Hi.

Not sure if I should start a new thread or post here. Please let me know.

I’m trying lesson 3 with road camera vehicle photos. The task here is creating a privacy mask hiding the inside of the vehicles.

I’m working in segmentating the windows. Right seems promising.

My doubt is: when an image contains no windows, should I present it to the network? With a full zeroed mask? Or should I just use photos where a window is presented?

As an aside, how accurate should I be when annotating data? There is a need to be pixel perfect?

Thank you.

Transform your RGB masks into gray scale masks with values from 0 to 5 (for 6 classes). Use either Python or ImageJ. Using the transformed masks it should work.

1 Like

Yes you should. Otherwise your model will see windows everywhere. But getting the percentage of null images right is hard to guess without experimentation.

Are these external road cameras? I think I would have approached it by segmenting non-window parts of vehicles to discover internal windows. If internal dash cams, I would detect the inside of the car like dashboards and passengers rather than the window.

Good luck!!

I wrote some functions to help people having problems with labels in their masks. See here:

2 Likes

thank you for great feedback. Images I am using are from Potsdam satellite image set.

All the pixels are labeled as something.

In case it may be of some use, this is the code I used to convert RGB lables.

files = get_image_files(path + ‘/original_RGB_lables’)
for file in files:
temp = Image.open(file)
temp = temp.convert(‘L’)
#plt.imshow(temp)
pixels=np.array(list(temp.getdata()))
reshape = int(pixels.shape[0] ** .5)
#print(reshape)
pixels = pixels.reshape(reshape,reshape)
pixels = pixels.astype(int)
pixels = np.where(pixels==255,0,pixels)
pixels = np.where(pixels==225,4,pixels)
pixels = np.where(pixels==178,2,pixels)
pixels = np.where(pixels==149,3,pixels)
pixels = np.where(pixels==76,5,pixels)
pixels = np.where(pixels==29,1,pixels)
#for k in range(7):
# print(str(k) + ‘:\t’ + str(list(pixels.flatten()).count(k)))
array = np.array(pixels, dtype=np.uint8)
new_image = Image.fromarray(array)
#plt.imshow(new_image)
#print (pixels)
new_image = new_image.convert(‘L’)
new_image.save(str(path) + ‘/monochrome_lables’ + ‘/’ + str(file.name))
#mask = open_mask(str(path) + ‘/monochrome_lables’ + ‘/’ + str(file.name))
#mask.show(figsize=(5,5), alpha =1)
#print(mask.data)

after this I still got:
cuda error: device-side assert triggered. Apparently, this happens if number in the mask > num_classes.

After manually examining labelled images, it turned out that one of them (label 4_12) had values that were not in [0,5] range. I think something is wrong with this RGB labeled image. I removed it from the set and now it is training…

I’ll investigate latter what is going on with 4_12 image-label file, but in case someone plans to work with the same data set, pay attention to that.

One other thing I had to do is to label ‘background/clutter’ calls as void and exclude it from target matching in the accuracy function. Otherwise it was running out of memory.

codes = array([‘Impervious_surfaces’,‘Building’,‘Low_vegetation’,‘Tree’,‘Car’,‘Clutter_background’])

thanks again

1 Like

thank you, post the RGB label conversion , you didn’t have to use “div=True” right ?

I have a unique problem

mask = open_mask(get_y_fn(img_f),div=False)
mask.show()

Gives me this

image

But, this gives me a blank images without annotation, for the same file

mask = open_mask(get_y_fn(img_f),div=True)
mask.show()

image

However, if I do this

mask = open_mask(get_y_fn(img_f),div=True)
mask

I get the complete mask with the segmentation

Am I missing something? Please advise.

Found it, so PixelAnnotationTool creates mask by marking the pixel value of the masked area with the corresponding id value of the label being used. In my case, I used “road marking” with an id of 34, the mask area pixels were numbered as 34 and the other area with 0.
When i used div=true, it divided the entire mask by 255, making the mask area with also as 0.
Sorted this out by creating a custom open_image function where 34 was used as divisor instead of 255.

In which method does the div=True argument go if I use the DataBlock API ?

I’m using v1.0.51

Nowhere directly now. You should subclass SegmentationLabelList and write the open function to fit your need, probably

def open(self, fn): return open_mask(fn, div=True)
1 Like

Oh, I thought the changes had been pushed.

I picked this version of the solution already from your answers in this thread. Thanks! :slight_smile:

I have input images of size 4250 * 5500 , need to apply segmentation on it for a certain use case.
Do i need to bring down the resolution to say 960 * 720 ?. Or is it purely dependent on what will fit in to the GPU memory ?.
These are document images, what will be the best augmentation technique that can be applied on these images ?

1 Like

You can use large sizes if they fit in memory. I’ve worked with bs of 1 and 2.

Apart from bringing down resolution, another option is to use offset tiles. Practically, especially on inference, tiling has an advantage as Unet can struggle with edge accuracy. I just keep centre of tile predictions.

It all depends on your use case and I suspect size of objects in the image. Same goes for augmentations. Often you don’t need to be pixel perfect and downsizing is appropriate.

3 Likes