Issues building minimal image segmentation example using custom data

Issues building minimal image segmentation example using custom data

I’ve been working through the first few lectures, and decided that it might be interesting to try to reproduce the image segmentation problem from Lecture 01 / 01_intro.ipynb using a custom data set. In this case, I wanted to see if I could build a model that would segment bluebirds from an image. I downloaded a few (in this case, six) images, and annotated them using the Pixel Annotation Tool, which produced something like the following outputs, all of which have been resized to dimensions (w,h) = (300, 200):

Original Image
Masked images
"Watershed" Masked Image

Imperfect, but hey. My codes.txt file simply has


although I’ve tried with and without adding a Background line, and the full directory structure looks like

    . minimal_segmentation.ipynb
        ├── bluebird_config.json
        ├── bluebirds
            ├── codes.txt
            ├── resized_images
            │   ├── 0001.png
            │   ├── 0002.png
            │   ├── 0003.png
            │   ├── 0004.png
            │   ├── 0005.png
            │   └── 0006.png
            └── resized_labels
                ├── 0001_watershed_mask.png
                ├── 0002_watershed_mask.png
                ├── 0003_watershed_mask.png
                ├── 0004_watershed_mask.png
                ├── 0005_watershed_mask.png
                └── 0006_watershed_mask.png

Obviously, training a network from only six example is not going to produce almost anything useful, but I’m really going for building basically the absolute minimal example I possibly can, and scale it from there.

This is the code I’m running in my notebook, broken up into cells.

from import *
from pathlib import Path

## For debugging, I'm currently run on CPU-only
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

path = Path("data/bluebirds")
# This dataloader structure is one I pulled from the online documentation, which itself runs correctly, but still causes the same error at the end
#fnames = get_image_files(path/'resized_images')
#def label_func(x): return path/'resized_labels'/f'{x.stem}_watershed_mask{x.suffix}'
#codes = np.loadtxt(path/'codes.txt', dtype=str) 
# dls = SegmentationDataLoaders.from_label_func(path, fnames, label_func, bs=1,codes=codes)

# This dataloader is almost directly pulled from the example code in Lesson #1
dls = SegmentationDataLoaders.from_label_func(
    path, bs=2, fnames = get_image_files(path/"resized_images"),
    label_func = lambda o: path/'resized_labels'/f'{o.stem}_watershed_mask{o.suffix}',
    codes = np.loadtxt(path/'codes.txt', dtype=str)

learn = unet_learner(dls, resnet18)
# The following line the where the error occurs
# I've also tried just using variations on, with no luck

with the error:

<ipython-input-6-59add907aa7c> in <module>
----> 1 learn.fine_tune(2,3e-3)
[...] // A stack trace that works its way through a bunch of PyTorch, which I'm happy to post if asked
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/

At this point, I’m a bit confused about how/what/why this error is occuring. I’m able to successfully complete the segmentation using the camvid_tiny dataset, which seems to be in a very similar form. Some things I’ve tried playing around with:

  • As mentioned, both the dataloader structures and fit() vs. fine_tune() sturcture.
  • Changing the number of classes in the codes.txt. This seems to create a similar issue, so maybe the error is because of a mismatched class size? Still, I’m not seeing the number of classes being manually set anywhere, so unless it’s an opaque abstraction between codes.txt and the labels/ images, I’m at a loss.
  • Using resnet34 and resnet18, to no avail
  • Cutting the body/building a new head and model based on the “Chapter 15: Application Architectures Deep Dive”, although my inexperience may be the issue with that
  • Googling/DDGing all the things, and finding many examples repeating this using the CamVid data, but none using a “custom” set.

Image segmentation on custom classes using new data, based on transfer learning, seems like it would be a particularly useful application for people new to machine learning (I know it would be for me), so having some sort of canonical example for building a minimal solution to that is something I would find immensely helpful. It certainly feels like this shouldn’t be as hard as it’s become for me, so any advice would be deeply appreciated!


Check out some suggestions from here:

I think my issue might be solved by the “Reassigning pixel values to classes” note, but am a little unsure about how to implement. First, I double checked, and my masks are assigning pixel values of (1,1,1), (2,2,2), and (3,3,3). However, these aren’t continuous from (0), so I’m not sure if that’s potentially creating an issue.

Also, I’d like to double check with you about where to use the n_codes() and get_msk() functions you built, since at the moment, they’re being directly built from the dataloader. You wrote

p2c = get_pixel_to_class(fnames) (where fnames is your list of masks)
get_y = lambda o: get_msk(o, p2c)

but I’m a bit confused about where the get_pixel_to_class() function is coming from. Is that just another name for your n_codes() function? And then get_y gets substituted into the lambda function for building the label names?

If so, would that end up making it something like the following?

path = Path("data/bluebirds")
def n_codes():
def get_msk():
    fn = path/'resized_labels'/f'{fn.stem}_mask.png'

filenames = get_image_files(path/'resized_labels')
p2c = n_codes(filenames)
get_y = lambda o: get_msk(o, p2c)

dls = SegmentationDataLoaders.from_label_func(
    path, bs=2, fnames = filenames,
    label_func = get_y,
    codes = np.loadtxt(path/'codes.txt', dtype=str)

learn = unet_learner(dls, resnet18) 

Alternatively, if this is all being caused by using the wrong data annotation tool, is there one that would work better and not create these issues with the FastAI library? I’m not hung up on using the PixelAnnotationTool I linked, it just seemed like the easiest one at first glance.

Okay, I think I figured it out! It turns out that the issue was have my classes being labelled at (1,1,1), (2,2,2), and (3,3,3).

When I converted the mask file to have values continuous from zero for each class( i.e. (0,0,0), (1,1,1), and (2,2,2) ) for each of the three classes in my codes.txt file


I was able to run things no problem. I’m working on polishing things a little bit, and then will posted everything in a Github repository in case other people are interested. The information about my image annotation tool and the scripts (image processing was done using Rust) that I used for both the image resizing and color shift functions will also be available.

1 Like