About auto-cleaning Jupyter notebooks for git

Edit

I no longer clean my notebooks for git. I want the outputs saved so when I upload the notebooks, people can view the outputs on github. I now instead update .py files to save my work, and update notebooks when I’m ready to show people. With this workflow, no one needs to check the notebook diffs, and so I can choose to keep the outputs over keeping the notebook diffs clean.


Old post


Goal

To automatically clear outputs and metadata from Jupyter notebooks when adding them to a git repo.

Why

To make git commits and git diffs cleaner.

A solution

In a terminal, enter the following to install jq:

sudo apt-get install jq

“jq is a lightweight and flexible command-line JSON processor (‘sed for JSON data’).”

In your ~/.gitconfig file, add:

[core]
attributesfile = ~/.gitattributes_global

[filter "nbstrip_full"]
clean = "jq --indent 1 \
        '(.cells[] | select(has(\"outputs\")) | .outputs) = []  \
        | (.cells[] | select(has(\"execution_count\")) | .execution_count) = null  \
        | .metadata = {\"language_info\": {\"name\": \"python\", \"pygments_lexer\": \"ipython3\"}} \
        | .cells[].metadata = {} \
        '"
smudge = cat
required = true

This will define a JSON filter, named “nbstrip_full”, which will clear the outputs and metadata from the notebooks.

In your ~/.gitattributes_global file, add:

*.ipynb filter=nbstrip_full

This will cause git add to apply the nbstrip_full filter to notebooks when you use git add on them.

Gotchas with this solution

  • For pre-existing notebooks, consider doing a do-nothing commit to apply the filter
  • This filter setup makes doing a rebase more difficult (see link below for details)
  • This filter setup is global. Unset the filter for a specific repo by adding *.ipynb -filter to the local .gitattributes file.

Credit

Tim Staley’s blog post: Making Git and Jupyter Notebooks play nice (Feb 2017)

This solution and the gotchas came from his post. See his post if you want to know why he uses this solution and how it works in more detail.


If anyone knows of a cleaner solution, please let me know.

3 Likes