Another clue and partial solution.
The issue is that after setting all the random seeds before creating the DataBunch, training and validation losses are inconsistent (non-reproducible) as of the first fit(). The losses have a bimodal or trimodal pattern. These observations are all after a kernel restart between runs, no transforms, and num_workers = 0.
What I see is that setting these random seeds again before fit() yields consistent, reproducible loss results. However, setting them before create_cnn still yields the bimodal loss pattern.
Setting random seeds before creating the DataBunch is needed to have a consistent Train/Validate split.
create_cnn, in this case Resnet50, leaves the random seeds set inconsistently. (It’s possible that both DataBunch creation and create_cnn leave inconsistencies.)
Somthing in fit_one_cycle() then uses a random seed, perhaps to shuffle the images. Because the random seeds are inconsistent across runs, the losses are inconsistent.
You can get reproducible results by setting random seeds before creating the DataBunch (with num_workers=0) AND before the first fit_one_cycle().
I should say these conclusions are tentative because 1) there are other explanations, such as a flaky GPU; and 2) I have not identified the source of the inconsistency. But I have spend a large number of hours getting to this point, and hope that a more competent developer will eventually investigate.
Here’s the function (originally by someone else) I use to reset every random seed I’ve ever seen mentioned:
def random_seed(seed_value, use_cuda):
np.random.seed(seed_value) # cpu vars
torch.manual_seed(seed_value) # cpu vars
random.seed(seed_value) # Python
torch.cuda.manual_seed_all(seed_value) # gpu vars
torch.backends.cudnn.deterministic = True #needed
torch.backends.cudnn.benchmark = False
#Remember to use num_workers=0 when creating the DataBunch.
I hope this information can help anyone else who needs reproducible training. For myself, when trying to squeeze off fractions of a percent for a Kaggle competition, it does not work to have variations of 5% across training runs. With that much variation, small effects of changes to the model and parameters get lost in the noise.