[Solved] Reproducibility: Where is the randomness coming in?

Stephen - thanks for responding, and sorry that my issue was not clear. It’s to get a single deterministic measure when providing the same inputs, rather than a distribution of measures that varies by 1%. Even reloading the same initial model weights yields varying results if deterministic is not set to False and num_workers to zero.

1 Like

Oops! Now I see that this advice is already given in the fastai docs.

3 Likes

For anybody else wondering like me:

https://docs.fast.ai/dev/test.html#getting-reproducible-results

7 Likes

Ha… at least this is a function rather than the list of code in the docs. How about FASTAI library having a callable function or setting somewhere to do this. I think its a must have when experimenting.

As an addendum, you also need to watch out for seeds used in generating your test and training split. Mine was non-reproducible even after the above due to my use of the pandas sample function to create my validation set:

df.sample(frac=0.3)

You can fix this using the seed there too:

df.sample(frac=0.3),random_state=42)

Now I finally have reproducible results. Yay!

2 Likes

Hi @Pomo, I saw your very helpful answer and implemented it but I’m still not getting reproducible results. I’m not sure if I’m setting num_workers in the right place but I’m loading the datasets already split so this can’t be causing the problems like @blissweb mentioned. Here is how I’m creating the databunch, setting the seeds and running the learner:

 def random_seed(seed_value):
    import random 
    random.seed(seed_value) # Python
    import numpy as np
    np.random.seed(seed_value) # cpu vars
    import torch
    torch.manual_seed(seed_value) # cpu  vars
    
    if torch.cuda.is_available(): 
        torch.cuda.manual_seed(seed_value)
        torch.cuda.manual_seed_all(seed_value) # gpu vars
        torch.backends.cudnn.deterministic = True  #needed
        torch.backends.cudnn.benchmark = False

random_seed(0)

dep_var = 'NumberOfSales'
df = train_df[cat_vars + cont_vars + [dep_var]].copy()

path="c:/Benchmarking/testBench.csv"
data = (TabularList.from_df(df, cat_names=cat_vars, cont_names=cont_vars, procs=procs,)
                .split_by_idx(valid_idx)
                .label_from_df(cols=dep_var, label_cls=FloatList, log=False)
                .add_test(TabularList.from_df(test_df, path=path, cat_names=cat_vars, cont_names=cont_vars))
                .databunch(num_workers=0))

    #x=best.x   I'm using scikit opt to find the best parameters but then can't reproduce the results.
    x=[500, 500, 100, 0.0005, 0.4, 8]
    print(x)
    learn3 = tabular_learner(data, layers=[x[0],x[1],x[2]], ps=[0.09,0.5,0.5], emb_drop=0.04, 
                        y_range=y_range, metrics=mae)
    learn3.fit_one_cycle(1, x[3], wd=x[4], div_factor=x[5])
3 Likes

Hi Rodrigo,

More work has since been done on this question by myself and others. It looks like random seeds need to be set before creating the DataBunch and before the first fit() and maybe before creating the Learner. Please see this thread:

https://forums.fast.ai/t/lesson1-reproducible-results-setting-seed-not-working/37921
Also,
https://forums.fast.ai/t/help-debug-reproducable-results-solved/48839/2?u=pomo

But I am now quite out of touch with the current “SOTA” in fastai reproducibility. (I was using it to isolate the effects of hyperparameters.) It would be a service if you could combine these posts, do your own experiments, and summarize your conclusions here. I would certainly appreciate it!

1 Like

Ok, finally got it to work. So just detailing the instructions a bit more:

  1. You have to run random_seed(0), before the first fit;
  2. You have to run it before creating the databunch;
  3. And you have to call it every time for each different time you call fit.

I was calling it before creating the databuch and assuming the seed would be set. So besides the code above, this solved it for me:

    random_seed(0) #Need to insert this line here again before calling fit
    x=[500, 500, 100, 0.0005, 0.4, 8]
    learn3 = tabular_learner(data, layers=[x[0],x[1],x[2]], ps=[0.09,0.5,0.5], emb_drop=0.04, 
                        y_range=y_range, metrics=mae)
    learn3.fit_one_cycle(1, x[3], wd=x[4], div_factor=x[5])
10 Likes

Thanks! Your efforts will be helpful to me and others.

To get reproducible results between kernel restarts, run script or jupyter with fixed PYTHONHASHSEED:

env PYTHONHASHSEED=42 python train.py
or
env PYTHONHASHSEED=42 jupyter notebook

Note that setting PYTHONHASHSEED inside notebook or train doesn’t help. Hope this helps!

2 Likes

i’m using this to seed as suggested here:

def random_seed(seed_value, use_cuda):
    np.random.seed(seed_value) # cpu vars
    torch.manual_seed(seed_value) # cpu  vars
    random.seed(seed_value) # Python
    if use_cuda: 
        torch.cuda.manual_seed(seed_value)
        torch.cuda.manual_seed_all(seed_value) # gpu vars
        torch.backends.cudnn.deterministic = True  #needed
        torch.backends.cudnn.benchmark = False

but i’m not able to reproduce the values. what am i missing ?
Thanks in advance. :slight_smile:

@barnacl make sure you also pass a seed into your RandomSplitter too, you may be missing one there because you’ve already split everything before you set it

1 Like

oops! should have been more careful.
made that change but still missing something. here is the copy of your notebook @muellerzr with the changes i added in. https://colab.research.google.com/drive/1Ur6ftKvOjXgukHlmhiPa7AUcMZCK0hTQ
fastcore just got updated and is breaking somethings i thinks.
will report back

@barnacl another issue could be your environment setup. If you look at the most recent notebooks I just do a pip install fastai2. No need for torch etc to be on specific versions :slight_smile:

ah ok let me check that too. thank you

pinned fastcore to 0.1.12 (0.1.13 was complainging about as_item missing).
not able to get rid of randomness.

i grabbed your random function and tried it with the mnist example from the walkthrough, still random :frowning:

using only pytorch (not fastai in this case, but no less amazing https://github.com/qubvel/segmentation_models.pytorch ),

was having same problem on jupyter, getting reproducible on run all cells (without restart ) using all the seed/force deterministic operations listed above,

but between kernel restarts results were always different :frowning:

note here: all results (splits, augmentation, pre train val epoch) were equal until torch training starts. during training something is being affected that I could only solve by setting mentioned PYTHONHASHSEED env prior to starting jupyter.

So far after doing this, can fully reproduce results between restarts. finally!
really tricky issue and hard to detect. prob a lot of people think they have reproducible results when they havent?.. (mostly like valid backups :slight_smile: )

next step: check container restarts :), host restarts, and different vms, and cloud providers… who knows…? :slight_smile:

(note: as mentioned by @esingildinov PYTHONHASHSEED has to be set prior to jupyter/kernel start, setting env var in notebook doesnt work, same thing noted here:



)

2 Likes

Any tips on how to this with collab ?

Here is an example of reproducibility in fastai2:

from fastai.vision.all import *
def is_cat(x): return x[0].isupper()
path = untar_data(URLs.PETS)/'images'
set_seed(42,True)
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fit(1)
set_seed(42,True)
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224))
learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fit(1)

please notice that you much set the seed before the dataloader is created, and recreate the dataloader when setting a new seed.

Dataloader keeps an internal Random Number Generator that is seeded with a random number. The seed is not updated with set_seed which is why you have to recreate it.

5 Likes