Developer chat

jamesrequa · October 29, 2018, 5:48pm

Hey all,

Just sharing this issue here with the validation set random seed.
https://forums.fast.ai/t/lesson-1-pets-benchmarks/27681/55?u=jamesrequa

Please feel free to verify this on your end as well. Steps to reproduce:

Set a random seed in the jupyter notebook np.random.seed(2)
Create an ImageDataBunch data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=bs, valid_pct=0.2)
Save train/val x for later trn_x, val_x = data.train_ds.x, data.valid_ds.x
Create a new ImageDataBunch repeating step 2
Check again the train/val x for this new data instance trn_x2, val_x2 = data.train_ds.x, data.valid_ds.x. Compare with the first train/valid set and verify they are not the same.

I have already implemented the code changes which fixes this issue so if you like I can submit a PR I think this is pretty important to fix right away as it can result in validation loss/error rate results which are not reliable and can happen in a very innocent way like if you just wanted to change the batch size or image size (as we saw one student achieved 1% error rate on pets dataset for this reason).