How should I version control exploratory notebooks using nbdev?

bitfrosting · October 22, 2020, 4:53am

Hello,

I am working with nbdev for the first time (what an incredible project!) and have run into a dilemma versioning exploratory notebooks. In my project I have found myself creating two categories of notebooks, the nbdev style notebooks that are used to generate the project python package and exploratory notebooks. My exploratory notebooks are something like a lab journal, organized chronologically, where I’m trying to answer a question using the project package as I’m developing it; these notebooks typically have sanity checks or examples that motivate the direction the project is moving and some thoughts in markdown.

The confusion I’m facing is that I’m not sure how to pin these notebooks to a specific tag or version of the project package generated through nbdev. I’m considering my exploratory notebooks to be one-off documents that do not need to be maintained/refactored as the project develops. Say I created a notebook with v0.0.1 of my project. Later, after I release v0.0.2, how would I document or pin my old notebook to v0.0.1 of the code, so that the results can be reproduced later if need be? So far the best idea I’ve had is to add a note at the top of the notebook, but that places the burden on my future-self to read the note and checkout an old version of my project if I ever wanted to re-run this notebook. This feels wonky to me and I suspect there is a widely used organizational pattern out there that I’m unaware of that addresses this issue (I suppose my exploratory notebooks could be considered a blog, but I didn’t see how fastpages tracks/manages notebook requirements).

I’d like to keep these exploratory notebooks in the repository as a record of my thinking during the development of my project. I suppose I could refactor each notebook to make sure they each run with the latest version of the project package, but I feel like this would undermine the intent of the notebooks, which I’m creating to journal my thinking regarding the direction of the project at a specific moment in time and state of code.

I’m looking for advice or links to examples of projects that set good examples for this type of notebook organization. Please let me know if I my aims are not clearly communicated above. Thank you in advance for any help and guidance!

bitfrosting · October 22, 2020, 5:03am

After posting this, I had another thought that maybe my exploratory notebooks should be refactored on principle. There is the fastbook after all, which has more far more content than I imagine my project will ever have.

hamelsmu · October 22, 2020, 5:38am

If you are using GitHub you can use releases. This is what all fastai projects use. Look into the fastrelease repo for an example.

If you don’t need a pypi package and just want to version your own code you can still use releases, this is essentially what the feature is for: https://docs.github.com/en/free-pro-team@latest/github/administering-a-repository/managing-releases-in-a-repository

bitfrosting · October 22, 2020, 6:09pm

Hi Hamel, thank you for your advice and the lead on fastrelease. I’ve dug into the fastrelease repo and believe I now understand how to use releases for my notebooks. Here is the pattern I plan to follow:

Create a release for my project with fastrelease. Within my project will be a settings.ini file that tracks the project version.
Create an exploratory notebook and commit it to my project repo.
Continue development and put forth more releases.
If/When I want to reproduce the results of an exploratory notebook, I can lookup the latest commit where this notebook was updated. I can then reference the settings.ini file in this commit to find the version of the project I’ll need to use to rerun the notebook.

Please let me know if my understanding does not reflect the advice you shared. Thanks for your help!