Validation set not reproducible in Kaggle?

ikedim · July 8, 2019, 5:53pm

I’m trying out Lesson 1 (dog and cat breeds) using Kaggle.

I understand from lecture 2 that the point of the np.random.seed(2)
call before creating the ImageDataBunch is to ensure we get the
same validation set every time. I’m trying to check this in Kaggle
(I add a cell evaluating data after the data = ImageDataBunch … line

this prints out the labels of the first few images in the validation set).

I find that the validation set is reproducible within the same kernel run
(every time I run data = ImageDataBunch after calling np.random.seed(2)
I get the same label list). However, after restarting the kernel I’m getting
a different label list for the validation set, even with the same seed.
Does this mean the validation set is different between runs?

muellerzr · July 8, 2019, 5:56pm

There’s more than just that one individual seed we have to set when we want reproducible results. See here: [Solved] Reproducibility: Where is the randomness coming in?

ikedim · July 8, 2019, 6:01pm

Thanks, but I thought that thread was if you wanted reproducible training results,
which I understand is more controversial - in lecture 2 Jeremy Howard says he
actually doesn’t usually advocate that. However, he says you do want the validation
set to stay the same, and that’s why they have the np.random.seed(2) call in the notebook.

muellerzr · July 8, 2019, 6:05pm

Try this, when we do split_by_rand_pct(), we can pass in a seed after our validation pct. Try that Otherwise, looking at the source code, it should be working like that. Just in case, try passing .split_by_rand_pct(seed=2)

ikedim · July 8, 2019, 6:34pm

Thanks! I tried adding seed=2 to the data=ImageDataBunch.from_name_re() call;
according to the source code this seed value should then be passed to split_by_rand_pct()

I’m still getting the same validation set within a session but not across kernel restarts.

ikedim · July 8, 2019, 6:36pm

To be clear I’m working from the Kaggle lesson 1 notebook forked from
https://course.fast.ai/start_kaggle.html