ImageDataBunch from 500 megapixel images as tiles

Hadus · March 17, 2019, 11:09pm

@digitalspecialists Here is one way to do it:

import contextlib
import numpy as np

@contextlib.contextmanager
def temp_seed(seed):
    state = np.random.get_state()
    np.random.seed(seed)
    try:
        yield
    finally:
        np.random.set_state(state)

    def get(self, i):
        with temp_seed(i):
            segment_idx = np.random.randint(0, self.segments_per_image)
        ...

    def get(self, i):
        with temp_seed(i):
            segment_idx = np.random.randint(0, self.segments_per_label)
        ...

This would still mean that every epoch we would have the same segments because the seed would repeat every epoch. To fix this we can use callbacks I think.

Hadus · March 17, 2019, 11:36pm

@digitalspecialists
To make item and label have the same random segment index use callbacks.

The code below doesn’t work! I am not sure how to write this callback… it should be something like:

class SegmentIdxGenCallback(Callback):
    def on_batch_begin(self):
        random_idx = np.random.randint(0, self.data.items.segments_per_image)
        self.data.labels.segment_idx = random_idx
        self.data.items.segment_idx = random_idx

fit(1, learner, cb=CallbackHandler([SegmentIdxGenCallback]))

then in SegmentationTileItemList change the beginning of get:

    def get(self, i):
        fn = super().get(self.segment_idx)
        ...

Iron4dam · June 23, 2019, 3:32pm

Hi, have you solved this yet?

mark-hoffmann · July 26, 2019, 8:15pm

@sgugger I have been messing around taking a stab at trying to fit this into the fastai framework. I’m building off of @Hadus’s solution above. Do you think the best way is to use a random seed for matching up the SegmentationTileLabelList and SegmentationTileItemList? Or if we return all segments by adding an axis and then handle downstream further? What about for the test dataset where we just want to scan across and then recombine and show the full stitched together image?

I’m fairly close on having my pipeline working by just preslicing the images and saving to disk so you load in that way initially. Then for test doing the slicing and then stitch back together outside of the fastai framework, but I think it would be really slick and help a lot of people if we try to get it integrated within.

massaros · November 4, 2019, 5:37pm

I used this solution. I created a custom open() method that takes a tuple of file path and tile index and returns a fastai Image of the selected tile. Then I customized also from_folder() method.

ImageTile = namedtuple('ImageTile', 'path idx rows cols')


def calc_n_tiles(size, tile_max_size):
    x, y = size
    n_cols = x % (x // tile_max_size + 1)
    n_rows = y % (y // tile_max_size + 1)
    return n_rows, n_cols, (x//n_cols, y//n_rows)


def get_labels_tiles(fn):
    path, *tile = fn
    path = path_lbl / path.name
    return ImageTile(path, *tile)


def get_tiles(images: PathOrStr, rows: int, cols: int) -> Collection[ImageTile]:
    images_tiles = []
    for img in images:
        for i in range(rows * cols):
            images_tiles.append(ImageTile(img, i, rows, cols))
    return images_tiles


def open_image_tile(img_t: ImageTile, mask=False, **kwargs) -> Image:
    """given and ImageTile it returns and Image with the tile,
    set mask to True for masks"""
    path, idx, rows, cols = img_t
    img = open_image(path, **kwargs) if not mask else open_mask(path, **kwargs)
    row = idx // cols
    col = idx % cols
    tile_x = img.size[0] // cols
    tile_y = img.size[1] // rows
    return Image(img.data[:, col * tile_x:(col + 1) * tile_x, row * tile_y:(row + 1) * tile_y])


class SegmentationTileLabelList(SegmentationLabelList):

    def open(self, fn: ImageTile):
        return open_image_tile(fn, div=True, mask=True)


class SegmentationTileItemList(ImageList):
    _label_cls, _square_show_res = SegmentationTileLabelList, False

   
    def open(self, fn: ImageTile) -> Image:
        return open_image_tile(fn, convert_mode=self.convert_mode, after_open=self.after_open)

    @classmethod
    def from_folder(cls, path: PathOrStr = '.', rows=1, cols=1, extensions: Collection[str] = None, **kwargs) -> ItemList:
        """patchs the from_folder method, generating list of ImageTile with all the possible tiles for all the images in folder"""
        files = get_files(path, extensions, recurse=True)
        files_tiled = get_tiles(files, rows, cols)
        return SegmentationTileItemList(files_tiled, **kwargs)

Hope this is not a silly approach and it can bu useful.

Joan · December 16, 2019, 11:21am

Hi! I am working in a similar project.

I would like to use very big images (Histological images - Whole Slide Images - WSI) and use the Databunch to learn from here. Actually I am using an approach that is similar to one @massaros is using but using a dictionary instead of a tuple and using openslide-python which is a library for managing this kind of images.

So, the code I would like to implement comes from this link where they implement this on PyTorch. But I am kind of stucked to get the labels correctly. Any ideas how I could manage to implement this on fastai?

Thanks!

massaros · December 26, 2019, 2:44pm

@Joan in my code I am overloading the open function for SegmentationLabelList to return the correct mask (and set the custom LabelList class)

lawrence · April 29, 2020, 10:23pm

I’m considering trying this with fastai2, but for bounding box labels instead of segmentation, and I would appreciate any updates as to what worked or didn’t with the solutions that were suggested here.

If I understand it right, it looks to me like @massaros has a workable solution, except it is perhaps missing a way to randomize tile selection. Did anyone get a solution going for picking tiles and labels at random from an image?

mark-hoffmann · May 28, 2020, 6:52am

Thank you so much for laying this out. This is the best approach I have seen. I have a quick question though. I was able to get the original images loaded in, but in my labeling step of the data loader I am having trouble seeing how to pass in the appropriate indices. For example I have the code below:

get_y_fn = lambda x: mask_path + f'/{x[0].stem}.png'

data = SegmentationTileItemList.from_folder(img_path, rows=8, cols=20) \
        .split_by_rand_pct(valid_pct=0.2, seed=5) \
        .label_from_func(get_y_fn, classes={'background':0, 'other':1})

and it looks like it is loading the tiles perfectly, but it is erroring because in open_image_tile on this line: path, idx, rows, cols = img_t It is trying to unpack <path> instead of [<path> 0 8 20]

How were you able to label appropriately while injecting in this Tiling class?

massaros · May 28, 2020, 12:44pm

Hi, I have made some refactoring to the code to improve it. You can find the latest version here https://github.com/mone27/fruit-detection/blob/master/semantic_segmentation_tile.py (some code needs improvements to be more general). I added the code to also change the background on the tile which is probably adding some complexity not needed for you.

@mark-hoffmann for me labelling works, I would suggest you to try with the latest version of my code (hoping it is clear enough) then tell me if you still have issues I would be more than happy to try to help you.
(Or you can send the full code you are using, because I did several revisions of the code and I don’t remember know what it did)

Let me know if you have any other question, doubt or suggestion.

massaros · May 28, 2020, 12:55pm

Hi, the tiles are randomized by the dataloader so you won’t get all the tiles form an image one after another. If you want to get only some random tiles from the image in the from_folder method instead of taking all the possible tiles just take a subset of them customize the get_tiles function:

def get_tiles(images, rows: int, cols: int, tile_info: Collection) -> Collection[ImageTile]:
    images_tiles = []
    for img in images:
        for row, col in product(range(rows), range(cols)):
            images_tiles.append(ImageTile(img, (row, col), *tile_info))
    return images_tiles

mark-hoffmann · May 28, 2020, 4:03pm

Ahh it was a very silly mistake on my end where I just had to reformat the tuple. Just had to sleep on it and I knew how to fix it instantly. Thank you so much though for the quick response!

I’ll check out the new version of your code as well!