Create DataLoader from DataFrame

mw00 · April 1, 2023, 9:09am

Hello everyone,

I’m in the process of creating a notebook for a Kaggle competition (Facial Keypoints Detection | Kaggle). The dataset contains a .csv file which has a column where the pixel values for the images are being stored:

The input image is given in the last field of the data files, and consists of a list of pixels (ordered by row), as integers in (0,255). The images are 96x96 pixels.

I loaded the csv as a DataFrame and now I want to create a DataLoader for my learner. Is there a way to use the pixel values from the DataFrame to construct a DataLoader directly, or do I need to save the data as image files first (e.g. png) just to load it one step later with get_image_files into a DataLoader?

mw00 · April 1, 2023, 3:38pm

I found a solution. For anyone wondering, you can use the get_y function from DataBlock to do the transformation for you without having to save any image to the file system.

Here’s an example on Kaggle from another notebook: Facial Keypoint Detection with fastai v2 | Kaggle

db = DataBlock(
    blocks = (ImageBlock, PointBlock),
    get_x = str2img,
    get_y = row2points,
    splitter = RandomSplitter(valid_pct=0.15, seed=42),
    batch_tfms = aug_transforms(do_flip=False, max_zoom=1.0), # should prob adjust these params    
)
dls = db.dataloaders(train_df)
dls.show_batch()

And str2img is defined as:

def str2img(row):
  imarr = np.fromstring(row.Image, dtype='int32', sep=' ').astype(np.int32)
  i = Image.fromarray(imarr.reshape(-1, 96)).convert('P')
  return PILImage(i)