Segmentation Dataset for run length encoding

(nok) #1


I was quite confuse with the SplitData Class and the kwargs, I was able to pull up the semi-finish notebook only because of the lesson 3 notebook. I follow the SegmentationDataset closely and run the debugger step by step. I notice that even though x and y should be a file list, the argument passed when creating the dataset was actually an array, I cannot figure out this part yet.

class SegmentationDataset(ImageDataset):
    "A dataset for segmentation task."
    def __init__(self, x:FilePathList, y:FilePathList, classes:Collection[Any], div=False, convert_mode='L'):

class SplitData():
    "Regroups `train` and `valid` data, inside a `path`."

    def datasets(self, dataset_cls:type, **kwargs)->'SplitDatasets':
        "Create datasets from the underlying data using `dataset_cls` and passing the `kwargs`."
        dss = [dataset_cls(*self.train.items.T, **kwargs)]
        kwg_cls = kwargs.pop('classes') if 'classes' in kwargs else None
        if hasattr(dss[0], 'classes'): kwg_cls = dss[0].classes
        if kwg_cls is not None: kwargs['classes'] = kwg_cls
        dss.append(dataset_cls(*self.valid.items.T, **kwargs))
        cls = getattr(dataset_cls, '__splits_class__', SplitDatasets)
        return cls(self.path, *dss)



I’m thinking we can just have an argument passed to SegmentationDataset to use open_mask_rle if it’s rle encoded. That’s nice work, thanks for your help!

(nok) #3

Thank you! Some sort of flag like rle=True? I guess the only difference for SegmentationDataset and SegmentationRLEDataset was the get_y_() and it takes an extra argument shape for telling open_mask_rle what is the size of the mask.


Even easier now that we can specify a mask_opener function, it’s jsut one more line in the data block API:

data = (ImageFileList.from_folder(path_img)
        .datasets(SegmentationDataset, classes=codes)
        .transform(get_transforms(), size=size, tfm_y=True)
        .normalize(imagenet_stats)) created by open_mask returns all zeros

Just merged your PR. Can you add documentation for the three functions you introduced now?

(nok) #6

Thanks a lot @sgugger. Am I supposed to make changes in fastai/fastai/doc_scr?

Btw, I have raised an issue on Github, seems that the links are broken on @stas Maybe you are maintaining this doc?

(nok) #7

Made a PR for the doc, I am not sure how to strip out the metadata for input cell (cell number, etc). Also I add a small sample csv for mask_rle. Currently this file stay in the same folder docs/img/ despite the fact that it is a csv.


Thanks for adding this!
You need to run tools/run-after-git-clone to automatically get your notebooks stripped. We can’t merge unless you do that step.

(nok) #9

I figure I should run tools/fastai-nbstripout -d file too, but seems that it’s still does not doing the job right…I saw a lot of noise with nbdiff.

Should I only re-run those cell that I added?

I saw something like this with nbdiff

(Jeremy Howard (Admin)) #10

Frankly, we don’t tend to do that, although it would make things a bit cleaner. :slight_smile:

(nok) #11

Ok, that’s great to hear. I was afraid I was not doing it in a right way. :slight_smile:

(nok) #12

Hi @sgugger, I just look at the mask_opener, how could I pass a parameter to the Segmentation dataset? The open_mask_rle () require an extra parameter for image shape

def _get_y(self,i): return self.mask_opener(self.y[i])

I found that I have not pushed the latest version of the demo notebook previously, I was attempting to pass an extra shape argument to the dataset.

(James Maxwell) #13

It seems like the docs need an update; is an ImageFileList an ItemList subclass? The docs mention ImageItemList, but not ImageFileList, so it’s tricky to figure out what methods the ImageFileList is expected to have.

(nok) #14

You can just hit the tab completion to see what method of a class has? I think it will be easier than looking up docs for method.


The docs don’t mention ImageFileList because this function doesn’t exist anymore, it was only there during temporary development

(James Maxwell) #16

What should we be using? I only arrived at ImageFileList because ImageItemList (from the docs) isn’t recognized—NameError: name 'ImageItemList' is not defined. I just did a pull, but no change…

UPDATE: Okay, it must be something with my environment. After pulling, I can now see that ImageFileList in is replaced with ImageItemList, but I still get the above error from my notebook. Strange.

(James Maxwell) #17

Okay… I see what’s going on now. I installed fastai using anaconda, but the version in site-packages is out of date (and updating doesn’t seem to give me ImageItemList—the source file in site-packages still has ImageFileList). Can anyone advise on how to get anaconda to use the current github version (i.e., my local fastai repo)?

(James Maxwell) #18

Okay, got it. I followed the advice here:
Not the accepted answer, but the one that advises on installing git and pip from anaconda.

(Wayde Gilliam) #19

I think you posted the wrong link? I’m having similar issues as well.

(James Maxwell) #20

Wow, yeah, obviously I did… bizarre (that looks like my jupyter notebook link!). I’ll edit that post with the correct link!