Hello everyone!, I would appreciate some help with a simple problem i am stuck at. You see, I am working with the MedMNISTv2 dataset URL which is downloaded with the .npz extension. The people who created the dataset also created a library in Pytorch to interact with.
I have downloaded the data with the following lines:
DataClass = getattr(mydataset, info[‘python_class’])
train_dataset = DataClass(split=‘train’, transform=data_transform, download=download)
test_dataset = DataClass(split=‘test’, transform=data_transform, download=download)
So now, I would like to load my data into a dataloader to train a model. However, in contrast with previous fastai versions, I cannot find a way to specify that I already have a training partition and a validation partition. I don’t need a splitter.
For datasets which are not partitioned and the images are located into their respective folder, I normally use:
ultrasound = DataBlock(blocks = (ImageBlock, CategoryBlock), get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
batch_tfms=aug_transforms(min_scale=0.75, do_flip=True, flip_vert=False, max_warp=0.1, max_rotate=5))
TLDR: I cannot find a way to specify directly my partitions in the DataBlock API, any suggestions?, thank you in advance.