Mapping between dataloader source objects and dataset itemts

Hello guys,

when creating a fastai2.data.core.DataLoaders object with the fastai2.data.block.DataBlock.dataloaders method, the first parameter is a list of source objects. In my case this is a list of custom wrapper objects, that basically wrap around the path of the image and some other stuff.
When the dataloaders get created, the objects in the dataset do not contain a reference to these source objects any more, as far as I have noticed.
During predictions I want to save the predictions in these custom data wrapper objects. So far the only way that I see to map between the dataset objects from the dataloaders and the original source objects is, that they are in the same order. I feel a bit uncomfortable with only relying on that.
Is there some reference I haven’t found or has someone come up with a better way to tackle this?

Here is my code calling the data block api for better understanding:

data = fastai2.data.block.DataBlock(
blocks=(ImageBlock, fastai2.data.block.MultiCategoryBlock),
get_x=lambda x: x.path,
get_y=lambda x: x.classification_labels,
splitter=fastai2.data.transforms.FuncSplitter(lambda x: x.is_valid),
item_tfms = fastai2.vision.augment.Resize(final_size),
batch_tfms=fastai2.vision.augment.aug_transforms(flip_vert=True))

dls = data.dataloaders(object_manager.objects, bs=bs)

The “object_manager.objects” simply wrap around preextracted tiles from whole-slide images with some additional information. [https://github.com/FAU-DLM/wsi_processing_pipeline/blob/master/preprocessing/objects.py#L19]

Thanks a lot in advance!

Christoph

That order is what you have to go with unless you make your own transforms to store this information (such as inheriting PIL and giving it whatever attribute you wish to store). It wasn’t designed to expect an object-based system like you are describing. However since you are using images, at the image level they are Pillow Image.Image’s at that point, so you could try a process like so to get the file names associated:

So, you can either work around it by the path method or build a custom Transform that inherits PILBase to extract the source information you want

However do note that is then lost at the DataLoader output level (as by then everything is a tensor)