Help debug Reproducable results [Solved]

dangraf · June 25, 2019, 4:54am

Hello.
I’m running a loop where I set the random seeds, creates a databunch, model and learner and then train.
The problem’ I’m experiencing is that i get different result for each iteration in the loop. I was expecting the exact same loss.
If I run my code again, i get the exact same sequence of losses, in that manner the result is repeatable.
btw, number of workers is set to 1 in my databunch.

def set_seed(seed=8):
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)
    
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)  # gpu vars
    torch.backends.cudnn.deterministic = True  # needed
    torch.backends.cudnn.benchmark = False


for i in range(3):
    set_seed(10)
    dt = My600DigitalTwin()
    dt.load_data(bs=128, bptt=60, max_len=13, testnames=tests)
    dt.create_learner()
    dt.learner.fit(epochs =1, lr=[1e-2, 1e-3, 1e-4])

the output from this loop is the following:
epoch train_loss valid_loss time
0 1.640803 1.251190 00:23

epoch train_loss valid_loss time
0 1.811387 1.173867 00:22

epoch train_loss valid_loss time
0 1.995546 1.269453 00:23

I’ve tried to debug this problem by eg printing out some of the weights of the model after initialization and check the data in the first batch that is sent to the model or check for uninitialized variables.
Everything seem to be the same.

I’ve read threads about the same problem like :

I’m trying to figure out why the results differ since I set everything as same. Does anyone have suggestions for me what to try next?

dangraf · June 25, 2019, 10:47pm

I found the bug, it was pytorch:

I was using 2 layer LSTM with dropout that produced this bug.