Question on ItemList getitem method

Interneuron · April 29, 2019, 4:04pm

Apologies if this is the wrong place to ask. Also, the method I’m trying to use may be completely wrong, please let me know if it is.

I made a custom ItemList for a segmentation task where the masks are run length encoded. All the data are in a pandas dataframe with columns: image id, rle encoded pixels, image height, image width, class.

The ItemList correctly returns a list of tuples of types (Image, ImageSegment) but it seems that since the itemlist.items are tuples I get an error when I try to split into training and validation that my index is not scaler.

sl = (SegRleList.from_df(train_df, path, cols=[0,1,2,3], folder='train'))

SegRleList (333415 items)
(Image (3, 5214, 3676), ImageSegment (1, 5214, 3676)),(Image (3, 5214, 3676), ImageSegment (1, 5214, 3676)),(Image (3, 5214, 3676), ImageSegment (1, 5214, 3676)),(Image (3, 5214, 3676), ImageSegment (1, 5214, 3676)),(Image (3, 5214, 3676), ImageSegment (1, 5214, 3676))
Path: .

code below

@classmethod
def from_df(cls, df:DataFrame, path:PathOrStr, cols:IntsOrStrs=0, 
 folder:PathOrStr=None, suffix:str='', **kwargs)->'ItemList':
"Get the filenames in `cols` of `df` with `folder` in front of them, `suffix` at the end."
 suffix = suffix or ''
    res = super().from_df(df, path=path, cols=cols[0], **kwargs)
    msk = ItemList.from_df(train_df, path, cols=[1,2,3])
    pref = f'{res.path}{os.path.sep}'
    if folder is not None: pref += f'{folder}{os.path.sep}'
    res.items = np.char.add(np.char.add(pref, res.items.astype(str)), suffix)
    res.items = [(res.items[i], msk[i]) for i in range(len(res.items))]
    return res

def get(self, i):
    fn = super().get(i)[0]
    mn = super().get(i)[1:]
    mask = open_mask_rle(mn[0][0], (mn[0][1], mn[0][2]))
    res = self.open(fn)
    self.sizes[i] = res.size
    return res, mask

and the error:

TypeError: only integer scalar arrays can be converted to a scalar index

Any advice is welcome.

sgugger · April 29, 2019, 4:42pm

You get the error because your items are in a list (that don’t support fancy indexing) and not a numpy array. I think just having

res.items = np.array([(res.items[i], msk[i]) for i in range(len(res.items))])

should solve your problem.

FYI the dataframe you use to create an itemList is always stored in inner_df so I think you could access it during the get call and only override that method in your subclass.

Interneuron · April 29, 2019, 5:26pm

That worked! Thanks so much!

Regarding the inner_df, I was unsure how to access it in the get call, but it is just self.inner_df, right? I was unsure how to access the index of the inner_df, but I will give it a try.

This is most helpful as the dataframe also contains the class id of the mask, which I am a bit unsure how to include properly or if I should include it for the segmentation. Thanks again!

sgugger · April 29, 2019, 5:38pm

Normally yes. Double check with TabularList if you want to see examples of when it’s used.

Interneuron · April 29, 2019, 11:52pm

Thank you again!

Actually I just remembered that applying class ids to the labels was as simple as passing a list or array to .label_from_func’s classes argument, which I could get from my SegList like so:
y_func = lambda x: x[1]

This gets the labels nicely, and splitting into train and valid works!

I’m a bit stumped on how to apply transforms, and databunch is not able to collate the list into batches. For the transforms, I think even though I was able to get the labels by telling label_from_func to grab the second element in each item tuple, transforms can only see the tuple and cannot apply.

So y is a clean labellist, but I am unsure of the best way to tell .transform and .databunch to only look at the first element of the tuple.

sgugger · April 29, 2019, 11:56pm

You need a custom ItemBase then, and implement apply_tfms to do what you want.

Interneuron · April 30, 2019, 12:46am

I’ve been following the custom itemlist tutorial and I’ve tried making the base in a few ways, so far most are giving me a recursion error, like from this:

class SegRleItem(ItemBase):
    def __init__(self, image, mask):
        self.image = image
        self.mask = mask
        self.obj,self.data = (image, mask),(image.data,mask.data)
    
    def apply_tfms(self, tfms, **kwargs):
        self.mask = self.mask.apply_tfms(tfms, **kwargs)
        self.image = self.image.apply_tfms(tfms, **kwargs)
        self.data = image.data,mask.data
        return self

    def to_one(self): return self.image.data, self.mask.data

having it return an Image and ImageSegment object results in something like this:

__main__.SegRleItem at 0x2af88be0a90

I will keep studying the source and tutorial, I think I may be putting the custom itemBase in the wrong place in the segrlelist class.

Interneuron · May 2, 2019, 4:54pm

Ok, so after more time and effort than I care to admit I was able to create a custom databunch that takes in file names and run length encoded strings from columns in a dataframe and returns an ImageList for x and a SegmentationLabelList for y.

It can properly show a batch and applies transforms correctly to the images and masks. Though the code is horrific, here is a gist showing what I’ve done. https://gist.github.com/raijinspecial/e405b9fdc889e3a1649c48b53dcd6f9a

Its not operational yet as it seems I am not specifying the classes properly, its taking each mask as one class so giving it to unet_learner results in final layer with number of outputs = number of masks, ~300k, so of course memory error. I think I’m getting closer though.

extra update

I was indeed doing it wrong. Instead passing the class ids I was just passing a list of the whole class id column from the df. Now the final layer is only 46 and the model starts to fit. Whether its any good remains to be seen, but I’m happy that I made it this far!

new gist - now shows fitting.

Question on ItemList __getitem__ method

Question on ItemList getitem method