When test_dl and dls.valid are equal, the results are different

Hello,

Using the script below, I get different results when I use the same data on my testset and validationset. Am I missing something? Thanks in advance!

X_test = [Path("./img1.jpg"), Path("./img2.jpg")]
datablock = DataBlock(blocks=(ImageBlock, MaskBlock(codes)),
                       get_items=get_image_files,
                       splitter=FuncSplitter(lambda o: o in X_test),
                       get_y=get_y_fn,
                       item_tfms=item_tfms, 
                       batch_tfms=[*aug_tfms, Normalize.from_stats(*imagenet_stats)])

dataloaders = datablock.dataloaders(path_img, path=path, bs=bs)

#### test dataloader ####
test_dl = dataloaders.test_dl(X_test, with_labels=True)
test_dl.vocab = codes

#### check if model overfits ####
print(learner.validate(dl=dataloaders.train))

#### validate validset ####
print(learner.validate(dl=dataloaders.valid))

#### validate testset ####
print(learner.validate(dl=test_dl))

When using this code, I get the following output:

dataloaders.train: [1.5848780870437622, 0.22719594836235046, 0.0376924449941295, 0.017072551355944307]

dataloaders.valid: [1.5183881521224976, 0.20538781583309174, 0.036572121452386336, 0.01625333443666005]

test_dl: [1.6501643657684326, 0.07489459961652756, 0.015047324393568559, 0.002782088336188925]

Test_dl and dataloaders.valid should give the same result

Hey @muellerzr !!! Any ideas here?

I checked that my valid_ds has the same elements as my test_dl.dataset. I also tried to implement another datablock just for testing as suggested in this post, but nothing works.

I would appreciate your help, otherwise I cannot do cross validation. Thanks!

By using test set == validation set, you are going to cause data leakage, as your model has already seen the test data, so I am not particularly surprised that the results would be different between validation and test.
If you would like to do cross validation on your model, follow the indications on the very link you posted, and always, always test on held out data.
Quick add: CV is a way of checking that your model generalizes well and does not overfit.
If the results of CV are satisfactory, train your model on the hyperparameters that worked well in CV, and only then test it.

@ubiest, I totally agree with you regarding Cross Validation best practices. What I find problematic is not having the same results when Validation Set and Test set are equal. They should give the same results in my case (I am still not doing CV).

My issue may have the following cause:

1- I am not defining correctly the testset, eventhough I compared and the dataset has the correct items
2- Test_dl is not apply the correct transforms during validate()
3- Some bug in fastai

I created a datablock just for the test_dl, but it is not giving the same results. Here is the code I used. Note that testDatablock is equal as trainDatablock, but when validating the model performance, they give different results (validsetResult != testsetResult).

trainBlock = DataBlock(blocks=(ImageBlock, MaskBlock(codes)),
                   get_items=get_image_files,
                   splitter=FuncSplitter(lambda o: o in X_test),
                   get_y=get_y_fn,
                   item_tfms=item_tfms, 
                   batch_tfms=[*aug_tfms, Normalize.from_stats(*imagenet_stats)])

dataloaders = trainBlock.dataloaders(path_img, path=path, bs=bs)

#### test dataloader #### 

#### it is not working!!!!!!! test_dl always gives different result compared to the trainBlock

testBlock = DataBlock(blocks=(ImageBlock, MaskBlock(codes)),
                   get_items=get_image_files,
                   splitter=FuncSplitter(lambda o: o in X_test),
                   get_y=get_y_fn,
                   item_tfms=item_tfms, 
                   batch_tfms=[Normalize.from_stats(*imagenet_stats)])  
testDataloaders = testBlock.dataloaders(path_img, path=path, bs=bs)

 #train the model using dataloaders  
 learner = unet_learner(dataloaders, resnet34, loss_func=loss_func, opt_func=opt_func, metrics=metrics, cbs=modelCallbacks, self_attention=False, act_cls=Mish).to_fp32()

#validate validset
validsetResult = learner.validate()

#validate testset
learner.dls = testDataloaders 
testsetResult =learner.validate()
1 Like

What is aug_tfms in this case?

@muellerzr Sorry, I omitted some code to simplify. Here is the missing part:

get_y_fn = lambda x : path_anno + '/' + f'{x.stem}_GT.png'
size = (256, 256)
item_tfms = [Resize(size, method=ResizeMethod.Squish, resamples=(Image.NEAREST,Image.NEAREST))]
aug_tfms = aug_transforms(mult=1, flip_vert=True, size=size)

fastai version 2.4.1 solved this issue. Now validation set and test set are giving the same result. thanks :slight_smile: