How to know what causes improvement (Paramchange vs Randomness)

AMusic · March 27, 2020, 1:39am

Hallo guys,
in order to reproduce my training and in order to better understand how certain parameterchanges e.g. learning rate, wd, pcl_start, slicing etc. effects the training process and the metric, it is important to set all the random seeds.

However, I tried:
1:
import torch
torch.manual_seed(42)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
2: set_seed(42)
3. np.random.seed(42)

Nothing seems to help, as I get varying results for:

[1] pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(seed=42),
get_y=using_attr(RegexLabeller(regex_pat), ‘name’),
item_tfms=Resize(460),
)

[2] dls = pets.dataloaders(path/“images”)

[3] learn = cnn_learner(dls, resnet34, metrics=error_rate, model_dir="/tmp
/model/")
[4] learn.fit_one_cycle(1, 0.004)

What shall I do in order to repeat fit_one_cycle with the same error_rate?

Hint:
I read in another post, that I shall set num_walkers = 0, yet there is no num_walkers param to be set in the new Datablock class.

—> I should mention that I use fastai2 on paperspace

wcneill · March 27, 2020, 7:47pm

Hello, I recently had this issue as well.

May I ask, are you setting your random seed in every cell that you want to reproduce the results of?

Until recently, I did not know that you have to set the seed in every cell, and thought that setting it once at the top of the notebook was sufficient. I was wrong, and now any time I do a test/train/validate split or anything else that is random, I set the seed in that specific cell.

I hope this helps

AMusic · March 27, 2020, 7:50pm

Hi wcneill,
today I found out that I must set the seed in every cell,
however even if I rerun the Notebook from top to bottom the results change.

AMusic · March 27, 2020, 8:00pm

Could you share and example notebook? Just to see what you are doing differently.

wcneill · March 27, 2020, 8:09pm

Hi AMusic, sure. I’m very much a beginner, so don’t judge too harshly!

Here is a notebook where I experimented with a few different ways of filling in missing data, or adding new features to the titanic data-set. Each time I tried something new, I wanted to test the results on the exact same train/validate/test set.

It’s a huge notebook, so just look for the cells with np.random.seed(333). You will see that every time I re-split the data, it is the same.

I’m going to split it up into several notebooks as soon as you are done looking at it. I won’t push the changes until you tell me you are finished browsing!

AMusic · March 27, 2020, 9:33pm

The solution to this problem you will find here: