A recipe for a reproducible randomization

stas · March 27, 2020, 6:48pm

The problem is not the code, but how it is used. If you rely on the non-linear use of the notebook, this is where the problem happens. NB is handy, but it creates a ton of hidden pitfalls, because you can run the cells at random order.

Summary:

If you run the code from the beginning to the end you and no other component resets the seed to a random value you only need to set the seed once at the beginning of the code.
If you jump back and forth, or, say, re-run just one cell that relies on random order in the functions it calls, you have to re-run the seed setting code every time before you run this cell (or best add it at the beginning of the cell).

p.s. you can probably modify GitHub - stas00/ipyexperiments: Automatic GPU+CPU memory profiling, re-use and memory leaks detection using jupyter/ipython experiment containers to automatically set the fixed seed before each cell is run. That will fix the problem for everybody. Well, ideally you will want to create a new module that does that, this would just be a quick hack.