Segmentation Dataset for run length encoding

PR:

I was quite confuse with the SplitData Class and the kwargs, I was able to pull up the semi-finish notebook only because of the lesson 3 notebook. I follow the SegmentationDataset closely and run the debugger step by step. I notice that even though x and y should be a file list, the argument passed when creating the dataset was actually an array, I cannot figure out this part yet.

class SegmentationDataset(ImageDataset):
    "A dataset for segmentation task."
    def __init__(self, x:FilePathList, y:FilePathList, classes:Collection[Any], div=False, convert_mode='L'):

class SplitData():
    "Regroups `train` and `valid` data, inside a `path`."
    path:PathOrStr
    train:LabelList
    valid:LabelList

.....omitted
    def datasets(self, dataset_cls:type, **kwargs)->'SplitDatasets':
        "Create datasets from the underlying data using `dataset_cls` and passing the `kwargs`."
        dss = [dataset_cls(*self.train.items.T, **kwargs)]
        kwg_cls = kwargs.pop('classes') if 'classes' in kwargs else None
        if hasattr(dss[0], 'classes'): kwg_cls = dss[0].classes
        if kwg_cls is not None: kwargs['classes'] = kwg_cls
        dss.append(dataset_cls(*self.valid.items.T, **kwargs))
        cls = getattr(dataset_cls, '__splits_class__', SplitDatasets)
        return cls(self.path, *dss)

Notebook:

I’m thinking we can just have an argument passed to SegmentationDataset to use open_mask_rle if it’s rle encoded. That’s nice work, thanks for your help!

1 Like

Thank you! Some sort of flag like rle=True? I guess the only difference for SegmentationDataset and SegmentationRLEDataset was the get_y_() and it takes an extra argument shape for telling open_mask_rle what is the size of the mask.

Even easier now that we can specify a mask_opener function, it’s jsut one more line in the data block API:

data = (ImageFileList.from_folder(path_img)
        .label_from_func(get_y_fn)
        .split_by_fname_file('../valid.txt')
        .datasets(SegmentationDataset, classes=codes)
        .set_attr(mask_opener=open_mask_rle)
        .transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))
1 Like

Just merged your PR. Can you add documentation for the three functions you introduced now?

Thanks a lot @sgugger. Am I supposed to make changes in fastai/fastai/doc_scr?

Btw, I have raised an issue on Github, seems that the links are broken on CONTRIBUTE.md. @stas Maybe you are maintaining this doc? https://github.com/fastai/fastai/blob/master/CONTRIBUTING.md

Made a PR for the doc, I am not sure how to strip out the metadata for input cell (cell number, etc). Also I add a small sample csv for mask_rle. Currently this file stay in the same folder docs/img/ despite the fact that it is a csv.

Thanks for adding this!
You need to run tools/run-after-git-clone to automatically get your notebooks stripped. We can’t merge unless you do that step.

I figure I should run tools/fastai-nbstripout -d file too, but seems that it’s still does not doing the job right…I saw a lot of noise with nbdiff.

Should I only re-run those cell that I added?

I saw something like this with nbdiff

Frankly, we don’t tend to do that, although it would make things a bit cleaner. :slight_smile:

Ok, that’s great to hear. I was afraid I was not doing it in a right way. :slight_smile:

Hi @sgugger, I just look at the mask_opener, how could I pass a parameter to the Segmentation dataset? The open_mask_rle () require an extra parameter for image shape

def _get_y(self,i): return self.mask_opener(self.y[i])

I found that I have not pushed the latest version of the demo notebook previously, I was attempting to pass an extra shape argument to the dataset.

It seems like the docs need an update; is an ImageFileList an ItemList subclass? The docs mention ImageItemList, but not ImageFileList, so it’s tricky to figure out what methods the ImageFileList is expected to have.

1 Like

You can just hit the tab completion to see what method of a class has? I think it will be easier than looking up docs for method.

The docs don’t mention ImageFileList because this function doesn’t exist anymore, it was only there during temporary development

1 Like

What should we be using? I only arrived at ImageFileList because ImageItemList (from the docs) isn’t recognized—NameError: name 'ImageItemList' is not defined. I just did a pull, but no change…

UPDATE: Okay, it must be something with my environment. After pulling, I can now see that ImageFileList in data.py is replaced with ImageItemList, but I still get the above error from my notebook. Strange.

Okay… I see what’s going on now. I installed fastai using anaconda, but the version in site-packages is out of date (and updating doesn’t seem to give me ImageItemList—the source file in site-packages still has ImageFileList). Can anyone advise on how to get anaconda to use the current github version (i.e., my local fastai repo)?

Okay, got it. I followed the advice here: https://stackoverflow.com/questions/19042389/conda-installing-upgrading-directly-from-github
Not the accepted answer, but the one that advises on installing git and pip from anaconda.

1 Like

I think you posted the wrong link? I’m having similar issues as well.

Wow, yeah, obviously I did… bizarre (that looks like my jupyter notebook link!). I’ll edit that post with the correct link!