[Solved] Reproducibility: Where is the randomness coming in?

(Malcolm McLean) #1

I would like to be confident of measures between different settings and models. However, even when initializing all the seeds I know about, training loss varies between identical runs.

def random_seed(seed_value, use_cuda):  #gleaned from multiple forum posts
    np.random.seed(seed_value) # cpu vars
    torch.manual_seed(seed_value) # cpu  vars
    random.seed(seed_value) # Python
    if use_cuda: torch.cuda.manual_seed_all(seed_value) # gpu 

data = ImageDataBunch.from_csv(csv_labels=LABELS, suffix='.tif', path=TRAIN, ds_tfms=None, bs=BATCH_SIZE, size=96).normalize(imagenet_stats)

data.show_batch(rows=2, figsize=(96,96))

learn = create_cnn(data, arch, metrics=error_rate)  #resnet34
lr = 1e-2
learn.fit_one_cycle(1, max_lr=lr)

The training losses for three runs are: .168973, .169944, .167258

Images displayed by show_batch appear to be in the same order and look identical (to the eye).

So what’s going on? It seems that if all the seeds are initialized, the results should be equal. A 1% variation over a single epoch is enough to affect my confidence in comparing various settings.

Could there be randomness in the GPU calculations? Or something related to CPU cores?

Thanks for any insight and advice.

fastai 1.0.30
Nvidia 1070

(Malcolm McLean) #2

P.S. Setting num_workers=1 in ImageDataBunch.from_csv() does not help.

(Stephen Johnson) #3

Weights and biases in PyTorch aren’t set randomly. https://discuss.pytorch.org/t/how-are-layer-weights-and-biases-initialized-by-default/13073

(Stephen Johnson) #4

You should be able to save your model after being created and before any training and then re-use that saved/untrained model on subsequent trainings

(Malcolm McLean) #5

Searching in these and in the PyTorch forums, it seem that many others have run into this issue of reproducibility. I gathered all their suggestions into the following code:

def random_seed(seed_value, use_cuda):
    np.random.seed(seed_value) # cpu vars
    torch.manual_seed(seed_value) # cpu  vars
    random.seed(seed_value) # Python
    if use_cuda: 
        torch.cuda.manual_seed_all(seed_value) # gpu vars
        torch.backends.cudnn.deterministic = True  #needed
        torch.backends.cudnn.benchmark = False

The good news is that the training code above now gives repeatable results. I did not test to know precisely which initializations are critical, but do know that torch.backends.cudnn.deterministic = True is necessary, and the num_workers does not matter. The not so good news is this reproducibility does not survive a kernel restart.

The best news is that it also gives repeatable results across kernel restarts iff num_workers=0 is passed to the data loader. This has something to do with each worker getting initialized with its own random seeds. Someone more patient than I could devise a worker_init_fn that provides both kernel restart repeatability and different seeds for each worker. But for now I am content with using num_workers=0.

To sum up - to get reproducible measures across runs and kernel restarts, use the above random_seed function and pass num_workers=0 when generating the DataBunch. Non-repeatabilty was leaking in through CudaNN and the data loader workers.

(Malcolm McLean) #6

Stephen - thanks for responding, and sorry that my issue was not clear. It’s to get a single deterministic measure when providing the same inputs, rather than a distribution of measures that varies by 1%. Even reloading the same initial model weights yields varying results if deterministic is not set to False and num_workers to zero.

(Malcolm McLean) #7

Oops! Now I see that this advice is already given in the fastai docs.


For anybody else wondering like me: