How to know what causes improvement (Paramchange vs Randomness)

Hallo guys,
in order to reproduce my training and in order to better understand how certain parameterchanges e.g. learning rate, wd, pcl_start, slicing etc. effects the training process and the metric, it is important to set all the random seeds.

However, I tried:
1:
import torch
torch.manual_seed(42)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
2: set_seed(42)
3. np.random.seed(42)

Nothing seems to help, as I get varying results for:

[1] pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(seed=42),
get_y=using_attr(RegexLabeller(regex_pat), ‘name’),
item_tfms=Resize(460),
)

[2] dls = pets.dataloaders(path/“images”)

[3] learn = cnn_learner(dls, resnet34, metrics=error_rate, model_dir="/tmp
/model/")
[4] learn.fit_one_cycle(1, 0.004)

What shall I do in order to repeat fit_one_cycle with the same error_rate?

Hint:
I read in another post, that I shall set num_walkers = 0, yet there is no num_walkers param to be set in the new Datablock class.

—> I should mention that I use fastai2 on paperspace

1 Like

Hello, I recently had this issue as well.

May I ask, are you setting your random seed in every cell that you want to reproduce the results of?

Until recently, I did not know that you have to set the seed in every cell, and thought that setting it once at the top of the notebook was sufficient. I was wrong, and now any time I do a test/train/validate split or anything else that is random, I set the seed in that specific cell.

I hope this helps :slight_smile:

Hi wcneill,
today I found out that I must set the seed in every cell,
however even if I rerun the Notebook from top to bottom the results change.

Could you share and example notebook? Just to see what you are doing differently.

Hi AMusic, sure. I’m very much a beginner, so don’t judge too harshly!

Here is a notebook where I experimented with a few different ways of filling in missing data, or adding new features to the titanic data-set. Each time I tried something new, I wanted to test the results on the exact same train/validate/test set.

It’s a huge notebook, so just look for the cells with np.random.seed(333). You will see that every time I re-split the data, it is the same.

I’m going to split it up into several notebooks as soon as you are done looking at it. I won’t push the changes until you tell me you are finished browsing!

The solution to this problem you will find here: