Jupyter and version control

During part 1, I remember not having version control was lost productivity for me. If there are any modifications to the notebook, I missed it many times and ran into same problems like many others. A quick search show me some possible (not perfect, but works) solutions. Do others feel we should try one of the options to keep it version controlled, just atleast the .py files?

Feel free to create a github repo - IIRC @brendan did so for part 1.

But even if we create a git ourselves, we would still need to do extra version control like plumbing to keep up with changes from platform.ai right?

Right - although it should just be a case of making sure you’ve got the most recent versions by doing a download after each lesson, and then pushing that to git.

Yes, that would solve the problem of finding the diff after the fact. But we still need to get a full dump of all files periodically (even if they have not changed). And there is no equivalent of git status before pull. Doable, but suboptimal isnt it?

I guess I’m not following. You mentioned that you thought it would be helpful to have a git repo, and I said - sure, go for it! :slight_smile: Are you saying you need some more help setting it up? Or am I missing something?

Sorry if I’m being a little slow…

git repo which you can push directly would be useful, like https://github.com/fastai/courses/tree/master/deeplearning1 . Instead of we do git like plumbing(periodic wget of all files) and push into git. Do you see the difference?

You can set up the git repo however you want it…? You can make it just like the one you linked to.

Thanks for your response Jeremy!

I guess I am not being clear :-). So git solves two problems

  1. Git pull only gets the diff of what has changed
  2. Git diff provides versioning of changes.

So your proposed solution is:

  1. Weekly once of => rm -rf part2 || wget platform.ai/part2
  2. git push part2 temp_part2

And now rest of the class can do git pull from temp_part2, instead wget from platform.ai

I agree it works, but seems suboptimal to me, as steps above are unnecessary if you can directly push to jeremy/git/part2 or someuser/git/part2. And rest of us can pull like any other git repo. Unless the actual problem is that you do not want to leave a part 2 git trail before it goes MOOC, which is understandable.

Either ways, its not a big deal :slight_smile: Just wanted to put the thought across if at all possible.

Here is the repo: https://github.com/sravya8/part2

Thanks @sravya8 - I actually think it’s better to have everything in a flat directory for git, rather than in subdirectories. I was thinking more about your suggestion last night and I think the approach I’m most comfortable with is just putting the .py files in git (I haven’t seen an approach to versioning frequently-changing notebook files that really works well).

Especially since my web host requires zipping .py files, which is annoying for all, this might save a little time. So I’ll put this together before Monday.

The downside is that in the MOOC students won’t have the exact same libraries that I show at each part of the course (i.e. lesson8/utils.py, lesson9/utils.py, etc), but that’s probably not so bad…

1 Like

Thanks @jeremy ! Appreciate it!

maybe the nbviewer service is a good way to find a happy medium?

you can just type in a github public repository and you can browse for the relevant files

You can also view a static version of the notebook:

And if you click the “view as code” the notebook is converted to runnable python with all the markdown cells and cell numbers are commented out.

Using the nbviewer service means there isn’t an extra workflow step? And students can have their files as py or ipynb files as they please?

I haven’t tried it yet, but this is supposed to be an answer to version control for notebooks.

It even has git integration with nbdime config-git --enable --global

1 Like

@davecg – Thanks for sharing. nbdime is new to me. I will check it out.

One common way to deal with notebooks in git is to use pre-commit hooks to strip the output from the notebook before committing. nbstripout includes instructions for configuring a git project to strip notebooks before commits.

There are cases when you want your committed notebooks to include the output. Github can render the notebook (minus any javascript and css) and nbviewer does as well (including js and css). But for sharing notebooks as code, all that output can get in the way.


This is an old thread and you’ll must have moved on but I wanted to chime in.

nbdime is nice for diff’ing and merging locally. nbstripout is good for stripping the outputs from notebooks before commit (if you can live without version controlling outputs).

I’ve also built ReviewNB that shows notebook diff (including output) of any GitHub commit or pull request.