Why do we get slightly different result each time we fit a model?

With exact same settings and same data (without data augmentation), each time I fit a model (i.e. run learn.fit) I got a slightly different result!
Where does this randomness come from? What to do to get the best possible result?

The algorithms used for training use randomness to initialize the values for variables. Plus the data might be randomly shuffled

You can use a seed to get repeatable results

For saving the best possible result, just save your model when the loss or accuracy are values you like

For getting the best possible result, that’s an open question for every problem =)

You might want to rewatch the lectures where Jeremy talks about spikiness of the error function, and how the algorithms jump around that function at random to find the best part. I can’t describe it better than he does.

1 Like

Here is an explanation as well as how you can fix all randomness seeds Planet Classification Challenge

In the thread of the link the trick only worked with precompute=true, we were not able to achieve reproducibility with precompute= false. @sermakarevich Have you achieved reproducibility with precompute = false? Where do you insert the seeds lines?

Nope, I don`t have this goal for some reasons:

  • getting high variation between different runs might be a good insight that there is something wrong.
  • natural variation can help you to achieve even better score if you use cv

Agree on both, 100%. But I do sometimes want to make “things equal” when I play with hyperparameters, question of taste, I guess :grinning:

How to save the best model when we run many epochs and one in the middle has the best result? how to come back to that specific epoch?

Oh, I guess you just rerun it from the start and set it up to stop processing at that epoch the next time. Either that or you can save the model after every epoch. I’m not sure if fastai has an option that lets you do that, or if you’d need to modify the code yourself to do so.

I’d probably just rerun it, because as Sergii said above,

getting high variation between different runs might be a good insight that there is something wrong.

Note that the top Kaggle scorers are likely using ensemble or stacked approaches, and we haven’t really learned to do that in this class yet. In my opinion, getting the perfect leaderboard score isn’t worth your time at this point. If you’re scoring around what other classmates are getting, that’s good enough.

cycle_save parameter might be handy