Reproducibility of results using the PyTorch fastai library

When running the fastai notebooks I think we often compare our results regarding loss and accuracy to the results shown in the lectures in order to get an idea of how well we are able to reproduce the same performances.

As an example, after running the Lesson 1 notebook (unchanged) for the first time on the official fastai AWS AMI I got a final TTA accuracy of 0.99299999999999999 (, which is less good than @Jeremy’s in the lecture.

From past experience with Keras I noticed that even when setting the Numpy and Keras random library seeds there are situations in which results are not completely reproducible.
Is there a suggested way to get reproducible results using PyTorch and the fastai lib?

1 Like

I think if you’re using python3 you need to set python’s random module seed to be fixed as well.

Edit: It’s been a while since this issue affected me, but my problem with python3 was not setting a seed for the hash function which affected results across restarts.

1 Like

It’s a small validation set - we’re talking 3-4 differently labeled images between our results. It’s just random.

1 Like

Thanks for all the observations. I will try to check if I can get reproducible results across different runs of a notebook.

@jeremy Is there a way so that we can pass seed for reproducible results in fastai?

No way to pass seed using fastai, but you can use standard pytorch stuff to do so.

Hello! I am quite interested in the general question of reproducibility. While on many examples it comes to slight differences there are cases where the difference is quite a bit bigger. @jeremy For example:

I have been struggling to get lesson4-imdb to converge to the stated accuracy for about two days now.

GPU: GTX 1080 Ti (local desktop)
notebook: lesson4-imdb (git pull yesterday)
environment: anaconda3
os: Ubuntu 18.04 LTS
CUDA: 9.0

In the lectures (and in the notebook) the learner achieves 4.165 but when I re-run the script (even if I add additional fit run with lower LR and longer cycle) I get stuck around 4.20, which ultimately results in ~90.5% accuracy.

I am currently running the vanilla setup on Paperspace, to see if that works and will report back later. It would be good to find the cause for this (significant) drop in final accuracy. Is it my GPU? CUDA version? PyTorch version?

Also: Paperspace is cool, but it seems to be training 4x slower than my local machine, so you can calculate the number of hours that would justify investing in a GPU :slight_smile: .

There’s updated language model training code here: . You should find you can improve on my notebook using these approaches.

It’s unlikely that software or hardware versions are going to be an issue - generally regularization and learning rates are what matters.

Thanks for the reply! I’m not sure I follow. What’s the hypothesis here? I am running the notebook as is, I haven’t changed anything, yet I see this discrepancy.

I didn’t always run the notebooks top to bottom.

1 Like