I’m working on an image segmentation application that consumes very high resolution images (
~ 500 mpx or
23000 x 23000 px). It can work fine by consuming the images in smaller tiles (it’s ok due to the nature of the images - microscopy slide scans, a “whole slide” view would be useless anyway).
But I was thinking of avoiding writing the image tiles to disk. Having the tiles generated in memory (system RAM ofc, not GPU’s) for the purpose of the short training session is OK though. So, my question is:
How could I create an
ImageDataBunch from a set of in-memory (system RAM) images (tiles I’d chop the original large image into), preferably without writing them to disk?
My loading code for now is very simple (and I’m kind of a total noob to fastai’s data block api - also “dataset” in my code refers to nothing remotely similar to fastai
Dataset, it’s specific to the models of my app):
def make_data_bunch(dataset_images, cls_codes): dir_path = dataset_images['image'].parent.parent return ( SegmentationItemList( items=[di['image'] for di in dataset_images], path=dir_path ) # -> : SegmentationItemList .split_by_files([ di['image'].name for di in dataset_images if di['purpose'] == 'validation' ]) # -> : ItemLists(train: SegmentationItemList, valid: SegmentationItemList) .label_from_lists( train_labels=[di['label_image'] for di in dataset_images if di['purpose'] == 'train'], valid_labels=[di['label_image'] for di in dataset_images if di['purpose'] == 'validation'], classes=np.asarray(cls_codes) ) # -> : LabelLists(train: LabelList(x, y: SegmentationItemList), valid: LabelList(x, y: SegmentationItemList)) .transform(get_transforms(flip_vert=True), tfm_y=True) .databunch(bs=1) # -> : ImageDataBunch(train: LabelList(x, y: SegmentationItemList), valid: LabelList(x, y: SegmentationItemList)) )
(Right now by handling this entirely outside
fastai lib I’d end up writing them to disk, but I was looking for a more “fastai idiomatic” way of doing it.)
Also, to note that in my application training happens in production, the user actually creates training sessions through a web ui, sets their parameters etc., it’s not a train then deploy trained model to production scenario. But the number of concurrent users would be small and the machine can have a ton of RAM, so I’m fine with creating GB-sized images in memory
Thanks in advance,