Is there a way to efficiently extract all PILImages from fastai.data.core.Dataset object?

I was trying to rewrite my code from fastai1 to fastai2 due to GPU incompatibility, and I run into a problem with extracting images:

fastai1 (extract all images from train_ds):
data.train_ds.x

I have 100,000 images as input, and my list comprehension technique doesn’t work (the thread was killed every time I try to run it):
[x[0] for x in data.train_ds]

Is there a better way to extract all the images in fastai2 from train_ds?

Thanks in advance!!!

Hi there,

I believe what you are looking for is dls.train.xs.

But why do you want to get the images like this?

You have taken the raw files from somewhere to put them into data.

1 Like

I wanted to create a custom dataset based on fastai Dataset class which perform some changes to images and labels. The data object I created is similar to

data = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=ColSplitter('is_valid'),
                   get_x=ColReader('fname', pref=str(path/'train') + os.path.sep),
                   get_y=ColReader('labels', label_delim=' '),
                   item_tfms = Resize(460),
                   batch_tfms=aug_transforms(size=224)). dataloaders(df)

from the vision tutorial: Computer vision | fastai.

I think dls.train.xs only works for Tabular data with TabularPandas (please correct me if I am wrong), but I had used DataBlock which doesn’t have ‘.train.xs’ as an attribution (sorry for not mentioning this in the question)?

Haven’t been on the forum for a while… Did you find a solution? My bad about the .train.xs being tied to tabular data.

Yes! The reason why I want to extract all the images is to custom changes to the dataset. Since the way I structured my code (creating a custom dataset) allows me to use getitem(self, i), I just extract the images using img[i][0] there.

1 Like

Glad you found a way :muscle:

Did you have fix on this issue. I am facing the same issue but no response from anyone and couldn’t find this topic troubleshooting in google.

Target Pay and Benefits

You could try to something like this

# loop through all the indexes of your dataloader
for i in data.train_ds.get_idxs():
    # create a file that has the index in the filename
    with open(f'train{i}.datatype', 'w') as f:
        # write create_item output to file
        f.write(data.train_ds.create_item(i)

My solution is to create a Dataset class of your own. You could start with something like:

class CustomDataset(Datasets):
  def __init__(self, images, labels, ...):
    # do something
  def __len__(self):
    # do something
  def __getitem__(self, i):
    # extract the images here:
    img = images[i][0]
    label = labels[i][1]
    # do something else with extract image and label

and then call the custom created dataset and pass in the entire datablock.train_ds or datablock.valid_ds:

train = CustomDataset(images = datablock.train_ds, labels = datablock.train_ds, ...)