Thank you for the clarification regarding RNG.
I created a minimal example and completely re-ran the notebook twice from top to bottom.
Each time with different error_rate after fit_one_cycle:
OK, I added an option to reset seed for each cell in ipyexperiments, so you can re-run any cell and the magic will happen behind the scenes. It needs more testing and docs, but you can try:
Looks correct to me. I haven’t tried fastai2 yet, so I won’t know, perhaps there is a bug and it sets a seed in the fastai code disregarding user settings? Especially, since the fastai philosophy is to randomize on purpose.
Do you see the same behavior with fastai v1? If I remember correctly it used to give reproducible results if you manually set the seed.
You can file a bug report, I’m sure Sylvain will sort it out in no time.
By now I am sure that you are right and that fastai2 is the cause.
I will try to repeat the same using fastai1 and provide my results here for future strugglers.
yes, of course, my module just saved you retyping the seed reset in each cell. if it doesn’t work in the first place when you did it manually, then you won’t see a different behavior. I thought that your original problem was that you were re-running the same cell with different parameters and were expecting the same random sequence. But based on your example this is not the case, so it’s not the wrong usage, but a fastai2 library issue.
Most likely fastai2 somewhere sets the seed, it needs to be found and made configurable with the default not to set it and thus allowing user to control the behavior.
splitter = RandomSplitter(), that has a seed (may be try setting that)
I haven’t succeeded with getting reproducible results either. The module is really helpful @stas thank you
In order to reproduce your results for demonstration purposes etc. use fastaiV1 not fastaiV2
@stas provided a code snippet that you must place at the top of your notebook (original post at the top)
call: random_ctl(bad_seed) first in every cell when randomness comes into play which is: before you create your databunch, before you create your learner, before you call fit_one_cycle —> HINT: set bad_seed != 0 for reproducibility
If done correctly you can then recreate your learner and run fit_one_cycle
A working example is shown in my example notebook:
def RandomSplitter(valid_pct=0.2, seed=None, **kwargs):
"Create function that splits `items` between train/val with `valid_pct` randomly."
def _inner(o, **kwargs):
if seed is not None: torch.manual_seed(seed)
rand_idx = L(int(i) for i in torch.randperm(len(o)))
cut = int(valid_pct * len(o))
return rand_idx[cut:],rand_idx[:cut]
return _inner
where are we setting the seed set to 42 ?
not sure if this changed recently or we have been setting 42 regularly
Honestly I can’t recall. Looking through all this seed stuff may have done it for me. I think it’s the call to set_seed sets that particular seed but having the 42 at the end of the day may also be needed. (And I think that’s where I got confused). IE if we set the manual seed I don’t believe we need to state it twice
The random_ctl function has one very important feature, which I think is being overlooked.
Have you ever had a situation that every 50th run the code fails or you get a terrible outcome? And then you have to re-run another 50+/-50 times to get it repeated, or it only appears when you are doing a live demo? But even if you do you won’t be able to fix it. That’s where knowing the seed saves the day. So, normally, you always run this function but w/o forcing the seed, i.e.:
random_ctl(0)
So by default it doesn’t go against the fastai philosophy. It just sets a new random seed every time you run it. So you can just have it in your nb template and forget about it.
if, however, you get a bad outcome, voila, this function reported the seed it randomly picked when it was run.
Using seed 999
Now you can re-run the code with:
random_ctl(bad_seed)
(bad_seed is the one reported by the function) and look at the unique situation that caused the problem, debug it, fix your code and then go back to random_ctl(0).
In order to reproduce your results for demonstration purposes etc. use fastaiV1 not fastaiV2
@stas provided a code snippet that you must place at the top of your notebook (original post at the top)
call: random_ctl(bad_seed) first in every cell when randomness comes into play which is: before you create your databunch, before you create your learner, before you call fit_one_cycle —> HINT: set bad_seed != 0 for reproducibility
If done correctly you can then recreate your learner and run fit_one_cycle
The fact that in fastaiv1 you have to reset the seeds in three places seems to imply that these operations can leave the RNGs in different states between runs. There must be an indeterminacy after the operations. Maybe there is another seed we have not identified and reset.
fastaiv2 shows a bigger problem - even with seed resets, results are not reproducible. The implication is that the RNGs are altered after the seed reset and before or during the operations.
Please excuse me if this logic is faulty. It’s late and I have been watching episodes of Dollhouse.
There was indeed a problem with fastai2’s DataLoader, which was not setting its seed in a reproducible fashion. Fixed and confirmed that the following code gives the same results all the time:
You need to use a developer install to have the latest fixes/features. This won’t be available via pip install fastai2 --upgrade until we make a new release of fastai2 (probably before next lesson on Tuesday).
In the meantime, you do a dev install with cloning the fastai2 repo then:
Meanwhile, if you’re already using ipyexperiments, you can now enable an automatic RNG seed setting by passing cl_set_seed=SEED, e.g. cl_set_seed=42. Here is an example:
# cell 1
import numpy as np
from ipyexperiments import IPyExperimentsPytorch
# cell 2
exp13 = IPyExperimentsPytorch(exp_enable=False, cl_set_seed=42, cl_compact=True)
rnd1 = np.random.random()
# cell 3 (the seed gets reset automatically before the cell is run)
rnd2 = np.random.random()
assert rnd1 == rnd2, f"values should be the same rnd1={rnd1} rnd2={rnd2}"
Of course, here it’d be a train loop cell that can be re-run again and again and get reproducible results, if that’s what you’re after.
The new version ipyexperiments-0.1.17 is on pypi and conda servers.
If you want to install the latest fastai2 version (from master), you can run the following command in a a jupyter cell at the beginning of your notebook: