DataBunches that grabs data "on the fly"?

I’ve been working on a multi-class image segmentation problem recently.

I have made my own functions, “get_input_image” and “get_mask_image” that will generate both my input image and my mask for me given a set of .csv files on the fly. It can take as input an index, i, so that a dataset’s get_x and get_y function will return the input image and mask. This has the benefit of being able to make data of any size I want, without storing them as .png files. Furthermore, I need to alter my problem slightly in the future (both input images and mask images), and thus this will be highly useful for my next steps.

Therefore, is there any thoughts on creating an ImageDataBunch.from_mask_func or similar, that will be able to take functions that will return both my input (x) and mask (y)? I hope that makes sense!

4 Likes

The data block API will take your custom opening functions.

1 Like

@StephenMak I’d be very interested in your method if it can create a dataset using the data_block from a function without creating .png files.
Could you please how would your function and data_block function looks like? I’like to apply it to a time series problem I have. Thanks in advance!

Ignacio, you can e.g. create you own Dataset

class MyDS(Dataset):
    def __len__(self):
        return 10000

    def __getitem__(self, index):
        # external database is queried here
        the_x = ...
        the_y = ...
        return torch.unsqueeze(the_x, 0), torch.unsqueeze(the_y, 0)
        # watch for dimensions and types here

Instances of MyDS are then given to Databunch.create. Databunch is then given to Learner.

See also this reply about fetching multiple inputs at once.

Ignacio @oguiza, in case you want to return a whole batch from Dataset.__getitem__(): see here

1 Like