How to stop the shuffling in valid dataloader

Hey everyone,
I wanted some consistent results with object detection fastai library which is built on fastai v1 , can anyone tell me why does everytime the valid_dl is called it gives random images , instead of a fix batch of images, i tried using a small batch size and giving less images to to the databunch but still when i do data.valid_dl.show_batch() it gives a different batch everytime, i don’t see any same batch twice.
Does anyone know how we can make constant batch of images and use the same batches fixed for my databunch?
All in all i want some fixed batches of images which don’t change dynamically.

Try setting a seed. That may stop the shuffling.

1 Like

how to set seed in the following code:

batch_size = 4

do_flip = True
flip_vert = True 
max_rotate = 90 
max_zoom = 1.1 
max_lighting = 0.2
max_warp = 0.2
p_affine = 0.75 
p_lighting = 0.75 

tfms = get_transforms(do_flip=do_flip,
                      flip_vert=flip_vert,
                      max_rotate=max_rotate,
                      max_zoom=max_zoom,
                      max_lighting=max_lighting,
                      max_warp=max_warp,
                      p_affine=p_affine,
                      p_lighting=p_lighting)
train, valid = ObjectItemListSlide(train_images) ,ObjectItemListSlide(valid_images)
item_list = ItemLists(".", train, valid)
lls = item_list.label_from_func(lambda x: x.y, label_cls=SlideObjectCategoryList)
lls = lls.transform(tfms, tfm_y=True, size=patch_size)
data = lls.databunch(bs=batch_size, collate_fn=bb_pad_collate,num_workers=0).normalize()

Since you’re using fastai v1, try using torch.seed.

torch.seed where??
Just setting the seed value using torch.seed?

Set it before defining your dataloader/databunch.

I used the following code:

seed = 42
os.environ['PYTHONHASHSEED'] = str(seed)
# Torch RNG
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
# Python RNG
np.random.seed(seed)
random.seed(seed)

#random.shuffle(full_dataset)
#training_set=full_dataset[:train_size]
#valid_set=full_dataset[train_size:(train_size+valid_size)]
#test_set=full_dataset[(train_size+valid_size):]
train_samples_per_scanner =2
val_samples_per_scanner =2
test_samples_pre_scanner=2
#train_images=list(training_set)
#valid_images=list(valid_set)
#test_images=list(test_set)
#np.random.seed(42)
train_images = list(np.random.choice(full_dataset, train_samples_per_scanner))
valid_images = list(np.random.choice(full_dataset, val_samples_per_scanner))
test_images=list(np.random.choice(full_dataset, test_samples_pre_scanner))
f"Created: {len(train_container)} training WSI container and {len(valid_container)} validation WSI container"

batch_size = 2

do_flip = True
flip_vert = True 
max_rotate = 90 
max_zoom = 1.1 
max_lighting = 0.2
max_warp = 0.2
p_affine = 0.75 
p_lighting = 0.75 

tfms = get_transforms(do_flip=do_flip,
                      flip_vert=flip_vert,
                      max_rotate=max_rotate,
                      max_zoom=max_zoom,
                      max_lighting=max_lighting,
                      max_warp=max_warp,
                      p_affine=p_affine,
                      p_lighting=p_lighting)
train, valid = ObjectItemListSlide(train_images) ,ObjectItemListSlide(valid_images)
item_list = ItemLists(".", train, valid)
lls = item_list.label_from_func(lambda x: x.y, label_cls=SlideObjectCategoryList)
lls = lls.transform(tfms, tfm_y=True, size=patch_size)
data = lls.databunch(bs=batch_size, collate_fn=bb_pad_collate,num_workers=0).normalize()

But when i use data.show_batch() it shows different batches each time:

data.show_batch()
data.show_batch()






As you can see in the images , all show different batches even though i set only 3 images from dataset, so it should take only those 3 images for creating the batches of size 2.
so after setting seed it should show same batch after some runs of data.show_batch().

pd.value_counts( data.train_ds.items)
pd.value_counts( data.valid_ds.items)

This shows the following output:
/content/drive/MyDrive/images/0.scn 2
dtype: int64
/content/drive/MyDrive/images/0.scn 2
dtype: int64