How to add batch transforms without using a DataBlock

Hello everyone!, I would appreciate your help or any tip in my battle with the following dataset.

I am working with the MedMNIST dataset, the authors created a small Pytorch library to interact with. The images are compressed in a .npz format. I would like to apply batch transforms to my data, the thing is I am not using a DataBlock. I managed to create a dataloaders following the book MNIST example. So far, the data loading part of my code looks like this:

data_flag = ‘breastmnist’
download = True
BATCH_SIZE = 128
info = INFO[data_flag]
task = info[‘task’]
n_channels = info[‘n_channels’]
n_classes = len(info[‘label’])

DataClass = getattr(medmnist, info[‘python_class’])

Apply minimal preprocessing

data_transform = TF.Compose([
TF.ToTensor(),
TF.Normalize(mean=[.5], std=[.5])
])

Download the partitions

train_dataset = DataClass(split=‘train’, transform=data_transform, download=download)
test_dataset = DataClass(split=‘test’, transform=data_transform, download=download)

Load them into a FastAI dataloader

training_dl = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
valid_dl = DataLoader(test_dataset, batch_size=BATCH_SIZE)

Create the DataLoaders

dls = DataLoaders(training_dl, valid_dl)

Any tip to apply batch transforms if you have worked in a similar workflow? I would like to use the FastAI transforms, not the Pytorch ones. Thank you!, I appreciate any tip.

Not sure but perhaps Zach Mueller’s blog post helps?

1 Like

I don’t have experience doing this, but in the documentation they show an example of how they build a TfmdDL from scratch and pass *aug_transforms() to the after_batch parameter:

cam_tdl = TfmdDL(cam_dsrc.train, after_item=ToTensor(),
                 after_batch=[IntToFloatTensor(), *aug_transforms()], bs=9)

I wonder if this could apply to your situation?

1 Like