ImageData.get_ds in fastai v1?

edwincv0 · October 25, 2018, 6:56am

Wondering what the equivalent of the ImageData.get_ds https://github.com/fastai/fastai/blob/e6b56de53f80d2b2d39037c82d3a23ce72507cd7/old/fastai/dataset.py#L450 is in the new fast AI v1? Working with a dataset where the x inputs are the rgby channels of an image and it would be helpful to define the get_x fn that was used in the previous version of fast AI. Thanks in advance!

maw501 · October 29, 2018, 8:56pm

I’ve just started trying to figure this out too.

I assume this is for the protein kaggle competition…

ste · October 29, 2018, 9:09pm

From lesson 1 pets example:

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs)
data.normalize(imagenet_stats)

# Get Torch Datasets
train_ds = data.train_ds
valid_ds = data.valid_ds

def get_x(dataset: Dataset):
    return dataset.ds.x

print(get_x(train_ds)[:10])

Hope this helps

edwincv0 · October 29, 2018, 9:49pm

@maw501 Yep! Ended up just combining the channels and saving them into a separate file. Still haven’t been able to get a competitive score tho…

edwincv0 · October 29, 2018, 10:19pm

@ste Was thinking more of how could pass in own custom get_x function because x the inputs consist of more than one file.

Sort of like:
https://www.kaggle.com/zhugds/resnet34-with-rgby-fast-ai-fork

class pdFilesDataset(FilesDataset):
    def __init__(self, fnames, path, transform):
        self.labels = pd.read_csv(LABELS).set_index('Id')
        self.labels['Target'] = [[int(i) for i in s.split()] for s in self.labels['Target']]
        super().__init__(fnames, transform, path)
    
    def get_x(self, i):
        img = open_rgby(self.path,self.fnames[i])
        if self.sz == 512: return img 
        else: return cv2.resize(img, (self.sz, self.sz),cv2.INTER_AREA)
    
    def get_y(self, i):
        if(self.path == TEST): return np.zeros(len(name_label_dict),dtype=np.int)
        else:
            labels = self.labels.loc[self.fnames[i]]['Target']
            return np.eye(len(name_label_dict),dtype=np.float)[labels].sum(axis=0)

And:

github.com

fastai/fastai/blob/master/courses/dl2/carvana-unet.ipynb?short_path=b9b45cf#L115


 "cell_type": "code",
 "execution_count": null,
 "metadata": {},
 "outputs": [],
 "source": [
  "class MatchedFilesDataset(FilesDataset):\n",
  "    def __init__(self, fnames, y, transform, path):\n",
  "        self.y=y\n",
  "        assert(len(fnames)==len(y))\n",
  "        super().__init__(fnames, transform, path)\n",
  "    def get_y(self, i): return open_image(os.path.join(self.path, self.y[i]))\n",
  "    def get_c(self): return 0"
 ]
},
{
 "cell_type": "code",
 "execution_count": null,
 "metadata": {},
 "outputs": [],
 "source": [
  "x_names = np.array([Path(TRAIN_DN)/o for o in masks_csv['img']])\n",

But how to do it in the fastai v1.

ste · October 29, 2018, 10:27pm

If this is the competition:

... Within each of these is a folder containing **four** files per sample. Each file represents a different filter on the subcellular protein patterns represented by the sample. ...

I think that ideally the best thing to do is to create “multi channel images” (4-channels).
I said “Images” because you’ve to treat them as real images and transform all the channels together during data augmentation, in order to keep the correct relation between different color filters (like in an RGB image).

Given that, I’m not sure that fastai lib supports “image stacks/multichannel” out of the box…

BTW: If they’re not supported, this could be a cool feature to add to fastai!

PS: another thing to take into account is if there are some “missing” channels due to the fact that sometimes they’re not provided and not because the protein is “black” to specific filter…

In case you need it, fiji/imagej is a free tool that supports multi channels (ie: tiff) images:

https://imagej.nih.gov/ij/docs/guide/146-8.html

ste · October 29, 2018, 10:37pm

You’re lucky: torch (used by fastai) supports RGBA images too, so probably you can save simple “PNG” files and treat the alpha channel as your fourth one.

https://pytorch.org/docs/stable/torchvision/transforms.html

Another interesting option could be integrating this into fastai:

edwincv0 · October 29, 2018, 11:06pm

@ste yep! that’s what I ended up doing. Just ended up saving an RGB file. Seems like file is an rgby file, last channel being yellow not alpha. Not too sure if it works the same, it might… Agreed though, would definitely be a good feature to work in. I know they added support for dicom images a few months back.

maw501 · October 30, 2018, 8:08am

This is promising in terms of the protein challenge - thank you, I will try this.

Though it doesn’t answer the open question about passing a custom get_x function like in previous versions of fastai.

ste · October 30, 2018, 9:32am

I think you’ve to subclass ImageDataset:

class SegmentationDataset(ImageDataset):
    "A dataset for segmentation task."
    def __init__(self, x:FilePathList, y:FilePathList, classes:Collection[Any], div=False, convert_mode='L'):
        assert len(x)==len(y)
        super().__init__(classes)
        self.x,self.y,self.div,self.convert_mode = np.array(x),np.array(y),div,convert_mode
        self.loss_func = CrossEntropyFlat()

    def _get_x(self,i): return open_image(self.x[i])
def _get_y(self,i): return open_mask(self.y[i], self.div, self.convert_mode)

github.com

fastai/fastai/blob/825c17da9dc736957b79037a84fdb39f72f48e03/fastai/vision/data.py#L156




@classmethod
def from_folder(cls, path:PathOrStr, folder:PathOrStr, fns:pd.Series, labels:ImgLabels, valid_pct:float=0.2,
    classes:Optional[Collection[Any]]=None):
    path = Path(path)
    folder_path = (path/folder).absolute()
    train,valid = random_split(valid_pct, f'{folder_path}/' + fns, labels)
    train_ds = cls(*train, classes=classes)
    return [train_ds,cls(*valid, classes=train_ds.classes)]


class SegmentationDataset(ImageDataset):
"A dataset for segmentation task."
def __init__(self, x:FilePathList, y:FilePathList, classes:Collection[Any], div=False, convert_mode='L'):
    assert len(x)==len(y)
    super().__init__(classes)
    self.x,self.y,self.div,self.convert_mode = np.array(x),np.array(y),div,convert_mode
    self.loss_func = CrossEntropyFlat()


def _get_x(self,i): return open_image(self.x[i])
def _get_y(self,i): return open_mask(self.y[i], self.div, self.convert_mode)

Probably using this technique you can feed multi channel images to the network.
The architecture can handle for sure multiple channels images, the problem remains on data augmentation…

edwincv0 · October 30, 2018, 11:09am

Hadn’t thought about that. Will give this a try! Thank you.

ste · November 4, 2018, 10:21pm

Hi @edwincv0,

I was trying to address the same competition, according to the suggestion I gave to you, but I got stuck integrating the loader in a DataBunch…

This is the multi-channel image loader that loads n images and build a single multi channel image:

import PIL
def openMultiChannelImage(fpArr):
    '''
    Open multiple images and return a single multi channel image
    '''
    mat = None
    nChannels = len(fpArr)
    for i,fp in enumerate(fpArr):
        #print('Loading: ', fp)
        img = PIL.Image.open(fp)
        chan = pil2tensor(img).float().div_(255)
        if(mat is None):
            mat = torch.zeros((nChannels,chan.shape[1],chan.shape[2]))
        mat[i,:,:]=chan
    return Image(mat)

# Usage sample
# v = (train_data_and_labels_df[train_df.columns]).values[0,:]
v = array(['00070df0-bbc3-11e8-b2bc-ac1f6b6435d0', # Object reference - not used here
       Path('/home/ste/.fastai/data/human-protein-atlas/train/00070df0-bbc3-11e8-b2bc-ac1f6b6435d0_blue.png'),
       Path('/home/ste/.fastai/data/human-protein-atlas/train/00070df0-bbc3-11e8-b2bc-ac1f6b6435d0_green.png'),
       Path('/home/ste/.fastai/data/human-protein-atlas/train/00070df0-bbc3-11e8-b2bc-ac1f6b6435d0_red.png'),
       Path('/home/ste/.fastai/data/human-protein-atlas/train/00070df0-bbc3-11e8-b2bc-ac1f6b6435d0_yellow.png')])

ret = openMultiChannelImage(v[1:])
display(ret)
print(ret.data.shape)

Hope this helps

edwincv0 · November 5, 2018, 3:16am

Awesome @ste . I’ll give a try.