GabrielLCH
(Gabriel Lim Chiaw Heng)
June 26, 2022, 9:34am
1
I have built a custom dataset of facial images. My dataset has an architecture of dataset → train folder → category folder → images. Below is some of my codes (I am new to DL)
def createDataBlock(custom_splitter=None, item_tfms=None, batch_tfms=None):
if custom_splitter == None: custom_sliter = RandomSplitter(valid_pct=0.2, seed=9)
if item_tfms == None: item_tfms = [Resize(384), FlipItem(p=0.3), RandomCrop(300)]
if batch_tfms == None: batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]
dataset = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=custom_splitter, # I wanna insert K fold here
get_y=parent_label,
item_tfms=item_tfms,
batch_tfms=batch_tfms
)
return dataset
train_path = "/content/train"
test_path = "/content/test"
# StratifiedKFold
folds = 10
skf = StratifiedKFold(n_splits=folds, shuffle=True, random_state=9)
val_pct = []
test_pc = []
batch_size = 32
for train_index, val_index in skf.split(X, y):
faceEmoDataset = createDataBlock()
dls = faceEmoDataset.dataloaders(train_path, bs=batch_size)
test_dl = dls.test_dl(test_path, bs=batch_size)
......
As I am going to implement the k fold cross validation for the train and validation set using images from the train path (I’m referring to this solution ), I wonder how do I get the X and y for the skf.split() function (I know the X is the training images’ index and y is the label of the images, but I don’t know how to get the label…) ?
Can anyone help me with this?
GabrielLCH
(Gabriel Lim Chiaw Heng)
June 27, 2022, 3:36am
2
Anyone can help me with this or give me any hint?
Hello Gabriel, I’ve got you covered.
You need to use need to add a val_index
argument with the createDataBlock
function, then pass it to IndexSplitter(val_index)
within the function.
Here is your code after modification
def createDataBlock(val_index, custom_splitter=None, item_tfms=None, batch_tfms=None):
if custom_splitter == None: custom_sliter = RandomSplitter(valid_pct=0.2, seed=9)
if item_tfms == None: item_tfms = [Resize(384), FlipItem(p=0.3), RandomCrop(300)]
if batch_tfms == None: batch_tfms = [*aug_transforms(size=224, max_warp=0), Normalize.from_stats(*imagenet_stats)]
dataset = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=IndexSplitter(val_index),
get_y=parent_label,
item_tfms=item_tfms,
batch_tfms=batch_tfms
)
return dataset
train_path = "/content/train"
test_path = "/content/test"
# StratifiedKFold
folds = 10
skf = StratifiedKFold(n_splits=folds, shuffle=True, random_state=9)
val_pct = []
fnames = get_image_files(train_path)
lbls = [parent_label(fn) for fn in fnames]
for train_index, val_index in skf.split(fnames, lbls):
faceEmoDataset = createDataBlock(val_index) # Here you pass the val_index to your datablock create function
dls = faceEmoDataset.dataloaders(train_path, bs=batch_size)
test_dl = dls.test_dl(test_path, bs=batch_size)
......
GabrielLCH
(Gabriel Lim Chiaw Heng)
June 27, 2022, 12:23pm
4
does this mean I don’t have to use StratifiedKFold to create the fold?
Oh I’m sorry I didn’t clarify my comment enough. You should use the rest of your code, I only wrote the part I modified. I’ll edit it to contain the rest so you get the idea.
GabrielLCH
(Gabriel Lim Chiaw Heng)
June 28, 2022, 9:25am
6
Oh yeah! Thank you very much for the solution! I got it after seeing how you get the parent label for fnames!