How to get the image label from custom dataset?

I have built a custom dataset of facial images. My dataset has an architecture of dataset → train folder → category folder → images. Below is some of my codes (I am new to DL)

def createDataBlock(custom_splitter=None, item_tfms=None, batch_tfms=None):
  if custom_splitter == None: custom_sliter = RandomSplitter(valid_pct=0.2, seed=9)
  if item_tfms == None: item_tfms = [Resize(384), FlipItem(p=0.3), RandomCrop(300)]
  if batch_tfms == None: batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]

  dataset = DataBlock(
      blocks=(ImageBlock, CategoryBlock),
      get_items=get_image_files,
      splitter=custom_splitter,   # I wanna insert K fold here
      get_y=parent_label,
      item_tfms=item_tfms,
      batch_tfms=batch_tfms
  )

  return dataset

train_path = "/content/train"
test_path = "/content/test"


# StratifiedKFold
folds = 10
skf = StratifiedKFold(n_splits=folds, shuffle=True, random_state=9)
val_pct = []
test_pc = []
batch_size = 32

for train_index, val_index in skf.split(X, y):
   faceEmoDataset = createDataBlock()
   dls = faceEmoDataset.dataloaders(train_path, bs=batch_size)
   test_dl = dls.test_dl(test_path, bs=batch_size)
   ......

As I am going to implement the k fold cross validation for the train and validation set using images from the train path (I’m referring to this solution), I wonder how do I get the X and y for the skf.split() function (I know the X is the training images’ index and y is the label of the images, but I don’t know how to get the label…) ?
Can anyone help me with this?

Anyone can help me with this or give me any hint?

Hello Gabriel, I’ve got you covered.

You need to use need to add a val_index argument with the createDataBlock function, then pass it to IndexSplitter(val_index) within the function.

Here is your code after modification

def createDataBlock(val_index, custom_splitter=None, item_tfms=None, batch_tfms=None):
  if custom_splitter == None: custom_sliter = RandomSplitter(valid_pct=0.2, seed=9)
  if item_tfms == None: item_tfms = [Resize(384), FlipItem(p=0.3), RandomCrop(300)]
  if batch_tfms == None: batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]

  dataset = DataBlock(
      blocks=(ImageBlock, CategoryBlock),
      get_items=get_image_files,
      splitter=IndexSplitter(val_index), 
      get_y=parent_label,
      item_tfms=item_tfms,
      batch_tfms=batch_tfms
  )

  return dataset

train_path = "/content/train"
test_path = "/content/test"


# StratifiedKFold
folds = 10
skf = StratifiedKFold(n_splits=folds, shuffle=True, random_state=9)
val_pct = []

fnames = get_image_files(train_path)
lbls = [parent_label(fn) for fn in fnames]

for train_index, val_index in skf.split(fnames, lbls):
   faceEmoDataset = createDataBlock(val_index)  # Here you pass the val_index to your datablock create function
   dls = faceEmoDataset.dataloaders(train_path, bs=batch_size)
   test_dl = dls.test_dl(test_path, bs=batch_size)
   ......

does this mean I don’t have to use StratifiedKFold to create the fold?

Oh I’m sorry I didn’t clarify my comment enough. You should use the rest of your code, I only wrote the part I modified. I’ll edit it to contain the rest so you get the idea.

Oh yeah! Thank you very much for the solution! I got it after seeing how you get the parent label for fnames!

Glad to help :grinning: