[Solved] Reproducibility: Where is the randomness coming in?

barnacl · February 27, 2020, 8:12pm

pinned fastcore to 0.1.12 (0.1.13 was complainging about as_item missing).
not able to get rid of randomness.

foobar8675 · February 28, 2020, 6:31pm

i grabbed your random function and tried it with the mnist example from the walkthrough, still random

rquintino · February 29, 2020, 1:52pm

using only pytorch (not fastai in this case, but no less amazing https://github.com/qubvel/segmentation_models.pytorch ),

was having same problem on jupyter, getting reproducible on run all cells (without restart ) using all the seed/force deterministic operations listed above,

but between kernel restarts results were always different

note here: all results (splits, augmentation, pre train val epoch) were equal until torch training starts. during training something is being affected that I could only solve by setting mentioned PYTHONHASHSEED env prior to starting jupyter.

So far after doing this, can fully reproduce results between restarts. finally!
really tricky issue and hard to detect. prob a lot of people think they have reproducible results when they havent?.. (mostly like valid backups )

next step: check container restarts :), host restarts, and different vms, and cloud providers… who knows…?

(note: as mentioned by @esingildinov PYTHONHASHSEED has to be set prior to jupyter/kernel start, setting env var in notebook doesnt work, same thing noted here:

)

barnacl · February 29, 2020, 3:55pm

Any tips on how to this with collab ?

marii · September 25, 2020, 7:14am

Here is an example of reproducibility in fastai2:

from fastai.vision.all import *
def is_cat(x): return x[0].isupper()
path = untar_data(URLs.PETS)/'images'
set_seed(42,True)
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fit(1)
set_seed(42,True)
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fit(1)

please notice that you much set the seed before the dataloader is created, and recreate the dataloader when setting a new seed.

Dataloader keeps an internal Random Number Generator that is seeded with a random number. The seed is not updated with set_seed which is why you have to recreate it.

wgpubs · October 14, 2020, 2:25am

Just wanted to say that this is one of the most important threads on the forum … and hopefully will find its way into the library and also a dedicated place in the docs w/r/t to when the seed needs to be set.

wgpubs · October 18, 2020, 10:45pm

So what is the definition here of “reproducible”?

I’ve followed the above steps, running my “set_seed” function before creating my dataloaders via dblock.dataloaders(df, num_workers=0), setting the seed in my RandomSplitter, and running that “set_seed” function before creating my Learner and before every call to fit_one_cycle … but the results are never identical.

And that is what I actually expect when training a NN.

So am I wrong? Are folks expecting and getting the exact same validation loss and metrics after each epoch every time they re-run their data loading and training loop code?

Run #1

Run #2

No kernel restart … no fit_cbs … just rebuilding the Learner and calling fit_one_cycle.

harikrishnanrajeev · October 21, 2020, 7:39pm

Pomo:

def random_seed(seed_value, use_cuda):
    np.random.seed(seed_value) # cpu vars
    torch.manual_seed(seed_value) # cpu  vars
    random.seed(seed_value) # Python
    if use_cuda: 
        torch.cuda.manual_seed(seed_value)
        torch.cuda.manual_seed_all(seed_value) # gpu vars
        torch.backends.cudnn.deterministic = True  #needed
        torch.backends.cudnn.benchmark = False

running a tabular model , was able to reproduce exact same validation loss and metrics using the random_seed() and placing random_seed(0,use_cuda=False ) before tabular_learner and also fit_one_cycle

many thanks to @Pomo , @rpcoelho and others

begovega · December 4, 2021, 11:20am

Simple and clear, thanks!!!