Segmentation Dataset for run length encoding

nok · November 1, 2018, 6:56pm

PR:

I was quite confuse with the SplitData Class and the kwargs, I was able to pull up the semi-finish notebook only because of the lesson 3 notebook. I follow the SegmentationDataset closely and run the debugger step by step. I notice that even though x and y should be a file list, the argument passed when creating the dataset was actually an array, I cannot figure out this part yet.

class SegmentationDataset(ImageDataset):
    "A dataset for segmentation task."
    def __init__(self, x:FilePathList, y:FilePathList, classes:Collection[Any], div=False, convert_mode='L'):

class SplitData():
    "Regroups `train` and `valid` data, inside a `path`."
    path:PathOrStr
    train:LabelList
    valid:LabelList

.....omitted
    def datasets(self, dataset_cls:type, **kwargs)->'SplitDatasets':
        "Create datasets from the underlying data using `dataset_cls` and passing the `kwargs`."
        dss = [dataset_cls(*self.train.items.T, **kwargs)]
        kwg_cls = kwargs.pop('classes') if 'classes' in kwargs else None
        if hasattr(dss[0], 'classes'): kwg_cls = dss[0].classes
        if kwg_cls is not None: kwargs['classes'] = kwg_cls
        dss.append(dataset_cls(*self.valid.items.T, **kwargs))
        cls = getattr(dataset_cls, '__splits_class__', SplitDatasets)
        return cls(self.path, *dss)

Notebook:

github.com

noklam/log/blob/master/nbs/Open_mask_rle.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Notes\n",
    "\n",
    "Added:\n",
    "\n",
    "`SegmentationRLEDataset` can takes run-length encoding mask as input.  \n",
    "3 auxiliary functions are added to support this class. These function could be use separately.  \n",
    "\n",
    "`rle_encode`  \n",
    "`rle_decode`  \n",
    "`open_mask_rle`  \n",
    "\n",
    "Unit Test:\n",
    "\n",
    "`test_rle_encode_with_array`\n",

This file has been truncated. show original

sgugger · November 2, 2018, 1:51pm

I’m thinking we can just have an argument passed to SegmentationDataset to use open_mask_rle if it’s rle encoded. That’s nice work, thanks for your help!

nok · November 2, 2018, 3:19pm

Thank you! Some sort of flag like rle=True? I guess the only difference for SegmentationDataset and SegmentationRLEDataset was the get_y_() and it takes an extra argument shape for telling open_mask_rle what is the size of the mask.

sgugger · November 2, 2018, 7:03pm

Even easier now that we can specify a mask_opener function, it’s jsut one more line in the data block API:

data = (ImageFileList.from_folder(path_img)
        .label_from_func(get_y_fn)
        .split_by_fname_file('../valid.txt')
        .datasets(SegmentationDataset, classes=codes)
        .set_attr(mask_opener=open_mask_rle)
        .transform(get_transforms(), size=size, tfm_y=True)
        .databunch(bs=bs)
        .normalize(imagenet_stats))

sgugger · November 2, 2018, 7:27pm

Just merged your PR. Can you add documentation for the three functions you introduced now?

nok · November 3, 2018, 7:49am

Thanks a lot @sgugger. Am I supposed to make changes in fastai/fastai/doc_scr?

Btw, I have raised an issue on Github, seems that the links are broken on CONTRIBUTE.md. @stas Maybe you are maintaining this doc? https://github.com/fastai/fastai/blob/master/CONTRIBUTING.md

nok · November 3, 2018, 10:05am

Made a PR for the doc, I am not sure how to strip out the metadata for input cell (cell number, etc). Also I add a small sample csv for mask_rle. Currently this file stay in the same folder docs/img/ despite the fact that it is a csv.

sgugger · November 3, 2018, 12:44pm

Thanks for adding this!
You need to run tools/run-after-git-clone to automatically get your notebooks stripped. We can’t merge unless you do that step.

nok · November 3, 2018, 4:26pm

I figure I should run tools/fastai-nbstripout -d file too, but seems that it’s still does not doing the job right…I saw a lot of noise with nbdiff.

Should I only re-run those cell that I added?

I saw something like this with nbdiff

jeremy · November 3, 2018, 6:28pm

Frankly, we don’t tend to do that, although it would make things a bit cleaner.

nok · November 4, 2018, 4:19am

Ok, that’s great to hear. I was afraid I was not doing it in a right way.

nok · November 5, 2018, 6:29am

Hi @sgugger, I just look at the mask_opener, how could I pass a parameter to the Segmentation dataset? The open_mask_rle () require an extra parameter for image shape

def _get_y(self,i): return self.mask_opener(self.y[i])

I found that I have not pushed the latest version of the demo notebook previously, I was attempting to pass an extra shape argument to the dataset.

github.com

noklam/log/blob/master/nbs/Open_mask_rle.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Notes\n",
    "\n",
    "Added:\n",
    "\n",
    "`SegmentationRLEDataset` can takes run-length encoding mask as input.  \n",
    "3 auxiliary functions are added to support this class. These function could be use separately.  \n",
    "\n",
    "`rle_encode`  \n",
    "`rle_decode`  \n",
    "`open_mask_rle`  \n",
    "\n",
    "Unit Test:\n",
    "\n",
    "`test_rle_encode_with_array`\n",

This file has been truncated. show original

jbmaxwell · December 1, 2018, 1:36am

It seems like the docs need an update; is an ImageFileList an ItemList subclass? The docs mention ImageItemList, but not ImageFileList, so it’s tricky to figure out what methods the ImageFileList is expected to have.

nok · December 1, 2018, 7:10am

You can just hit the tab completion to see what method of a class has? I think it will be easier than looking up docs for method.

sgugger · December 1, 2018, 12:34pm

The docs don’t mention ImageFileList because this function doesn’t exist anymore, it was only there during temporary development

jbmaxwell · December 1, 2018, 4:43pm

What should we be using? I only arrived at ImageFileList because ImageItemList (from the docs) isn’t recognized—NameError: name 'ImageItemList' is not defined. I just did a pull, but no change…

UPDATE: Okay, it must be something with my environment. After pulling, I can now see that ImageFileList in data.py is replaced with ImageItemList, but I still get the above error from my notebook. Strange.

jbmaxwell · December 2, 2018, 5:24pm

Okay… I see what’s going on now. I installed fastai using anaconda, but the version in site-packages is out of date (and updating doesn’t seem to give me ImageItemList—the source file in site-packages still has ImageFileList). Can anyone advise on how to get anaconda to use the current github version (i.e., my local fastai repo)?

jbmaxwell · December 2, 2018, 5:31pm

Okay, got it. I followed the advice here: https://stackoverflow.com/questions/19042389/conda-installing-upgrading-directly-from-github
Not the accepted answer, but the one that advises on installing git and pip from anaconda.

waydegg · December 6, 2018, 2:18am

I think you posted the wrong link? I’m having similar issues as well.

jbmaxwell · December 6, 2018, 3:10am

Wow, yeah, obviously I did… bizarre (that looks like my jupyter notebook link!). I’ll edit that post with the correct link!