Am I doing k-fold cross validation right?

muellerzr · January 17, 2021, 10:53pm

Have you tried looking at the updated version on walk with fastai? It has a vision example Lesson 3 - Cross-Validation | walkwithfastai

Let’s rewrite this a bit to make it more fastai-like:

train_df, test_df = train_test_split(full_df, test_size=0.1)

folds = 4
skf = StratifiedKFold(n_splits=folds, shuffle=True)

val_pct = []
test_pct = []
batch_size = 32


for train_index, val_index in skf.split(train_df.index, train_df['label']):
    
    train_block = DataBlock(
            blocks=(ImageBlock, CategoryBlock),
            get_x=get_x,
            get_y=get_y,
            splitter=IndexSplitter(val_index), # added val_index
            item_tfms=[
                Resize(384),
                FlipItem(p=0.4),
                RandomCrop(300)
            ],
            batch_tfms=[Normalize.from_stats(*imagenet_stats)]
        )
    
    
    dls = train_block.dataloaders(train_df, bs=batch_size)
    test_dl = dls.test_dl(test_df, bs=batch_size)
    test_dl.after_item = Pipeline([Resize(384), ToTensor()]) # to get rid of the need for another DataBlock
    
    
    # train model
    learn = train(train_dl, resnet101, epochs=10, freeze_epochs=7)
    _, val = learn.validate()
    
    _, test = learn.validate(dl = test_dl)
  

    print('done, appending results.. \n')
    val_pct.append(val)
    test_pct.append(test)

So what did we do differently? We don’t need to write out a second DataBlock. Since item_tfms are done lazily, we can replace our test_dl’s item transforms with whatever we want (in this case we just want to Resize and apply ToTensor()). If you want that last RandomCrop to make your image (300,300), you can leave it in there safely as fastai by default will make that RandomCrop a CenterCrop on the validation set, similar to what Resize is doing (you’ve also now introduced test-time disparity, as you’re not exactly recreating what you trained on in image size)

Along with this, learn.validate can accept a dl param, so we can pass that test_dl in.

Hope this helps! You did a great job IMO, this is just my own preference towards writing a functionality like this