I’m trying out Lesson 1 (dog and cat breeds) using Kaggle.
I understand from lecture 2 that the point of the np.random.seed(2)
call before creating the ImageDataBunch is to ensure we get the
same validation set every time. I’m trying to check this in Kaggle
(I add a cell evaluating data after the data = ImageDataBunch … line
this prints out the labels of the first few images in the validation set).
I find that the validation set is reproducible within the same kernel run
(every time I run data = ImageDataBunch after calling np.random.seed(2)
I get the same label list). However, after restarting the kernel I’m getting
a different label list for the validation set, even with the same seed.
Does this mean the validation set is different between runs?
Thanks, but I thought that thread was if you wanted reproducible training results,
which I understand is more controversial - in lecture 2 Jeremy Howard says he
actually doesn’t usually advocate that. However, he says you do want the validation
set to stay the same, and that’s why they have the np.random.seed(2) call in the notebook.
Try this, when we do split_by_rand_pct(), we can pass in a seed after our validation pct. Try that Otherwise, looking at the source code, it should be working like that. Just in case, try passing .split_by_rand_pct(seed=2)
Thanks! I tried adding seed=2 to the data=ImageDataBunch.from_name_re() call;
according to the source code this seed value should then be passed to split_by_rand_pct()
I’m still getting the same validation set within a session but not across kernel restarts.