How to get consistent (exactly the same) predictions with retraining tabular_learner?

Vlaki · January 22, 2021, 12:14pm

Hello everyone,

I could not find the answer for this, so I will make my first post at this forum.

I am testing my tabular_learner (fastai v1) on business metrics (let’s just call it that way) for 100 times in the loop on the same data. To make training realistic (simulation like), I have to do several retrains per 1 loop run - so I keep the model updated with fresh data coming each day.

I expected that if I choose the initialized values that correspond to the best business metrics, I will always get training in the same direction. I had ps parameter set to none zero, so I thought that this is the source of the randomness, but it seems there is something more besides dropout and initialization.

So my question is how can I get the same results when I have the same initialization and same data and the same parameters with retraining the model?

The only way I got this is when I train data before all test days, then do predictions on all test data. But I want to have updated training each day, it does not behave consistently even with the same initialization and no dropout.

Thank you for help in advance.

Vlaki · January 26, 2021, 12:45pm

I will answer myself if anybody encounters the same thing as me - I have found some gensim discussion on github where last paragraph of Question 11 says:

“You can try to force determinism, by using workers=1 to limit training to a single thread – and, if in Python 3.x, using the PYTHONHASHSEED environment variable to disable its usual string hash randomization. But training will be much slower than with more threads. And, you’d be obscuring the inherent randomness/approximateness of the underlying algorithms, in a way that might make results more fragile and dependent on the luck of a particular setup. It’s better to tolerate a little jitter, and use excessive jitter as an indicator of problems elsewhere in the data or model setup – rather than impose a superficial determinism.”

I will try to play with those and see will I get what I was looking for…