Fastai-nbstripout: stripping notebook outputs and metadata for git storage

Since we can’t avoid this step due to git’s security, I guess the only thing we can do to make it simpler is to make it into a script, like tools/trust-origin-git-config:

#/bin/sh
# XXX: first check we are inside a repo, then enable
git config --local include.path '../.gitconfig'

So the sequence would be:

git clone git@github.com:fastai/fastai_v1.git 
cd fastai_v1
tools/trust-origin-git-config

or perhaps it can be called 'tools/run-upon-clone`, but it’s less explicit about its function. Other ideas are welcome.

p.s. @jeremy, what do you think? should we add this? I suppose implementing it in python would be better so that it works on windows.

That makes sense, it’s quite similar to what I am thinking about doing for my project (a company-internal collection of exploratory notebooks and tools).

In our case, we have a couple of similar setup steps that users likely want to run upon clone. My current plan is to bundle them all up into a single tools/bootstrap script that they need to run once, after which they are all set for both this and our other config details.

1 Like

Has there been some changes in the git? I noticed the last notebook I pushed wasn’t stripped.

No change. Perhaps you made a new clone and forgot to tell git to trust .gitconfig? check .git/config and see whether you have this entry:

[include]
        path = ../.gitconfig

While we can’t do the trusting step on behalf of the user, perhaps we can use some git hook to validate that it’s configured and prevent git push without it?

Not sure if I made a new clone, but I had rerun the line git config --local include.path ‘…/.gitconfig’ and I can see the config file does have the entry you mention. Still, two notebooks committed and pushed this morning without being stripped.

What happens if you edit one of the notebooks and run git diff with GIT_TRACE=1:

GIT_TRACE=1  git diff 

It will show whether fastai-nbstripout is run at all. Of course, git commit would be an even better test with the trace enabled.

Finally, you can debug the filenames it runs on with .gitconfig modified to print the filenames it runs on:

[filter "fastai-nbstripout-code"]
        clean = "f() { echo >&2 \"cleaning $1\"; tools/fastai-nbstripout; }; f %f"
        smudge = cat
        required = true
[diff "ipynb-code"]
        textconv = tools/fastai-nbstripout -t

and a similar change to -docs section if you are talking about docs notebook. It’s just the change to clean to print debug information.

Funny, when adding that block to the config file the stripping worked again (and I saw clean name_of_the_modified_nb when running git add -A).

I am not sure what happens when you add the same entry twice, probably the last overrides it. I meant to replace the old entry with this one.

Please let us know if you discover the culprit of why it wasn’t working for you just before that.

And you can fix the committed notebooks with:

tools/fastai-nbstripout dev_nb/*ipynb dev_nb/experiments/*ipynb
tools/fastai-nbstripout -d docs/*ipynb

and then re-commit.

tools doesn’t seem to work on my console (on windows with the anaconda prompt).
Just modified the notebooks I wanted to clean instead.

Please be more specific when you say something doesn’t work. How do you invoke it and what errors do you get?

I meant that I don’t have this command:

'tools' is not recognized as an internal or external command,
operable program or batch file.

Sorry, I don’t know anything about windows shell.

Do you need to run:

tools\fastai-nbstripout

or:

python tools\fastai-nbstripout

instead?

So that means that you and Jeremy have verified that this works on windows when invoked through git, but not directly, correct? Let’s figure out how one runs this directly on windows and document this.

And what is windows with the anaconda prompt - what kind of shell is it running?

I need fastai-nbstripout to be renamed fastai-nbstripout.py, then I can run

python tools/fastai-nbstripout.py

I have never used the script directly before, no, only checked it worked with git, that’s correct.
I think the anaconda shell is just wrapped around cmd.exe, so it’s the basic windows shell (with python and conda commands working). Not sure though.

Unless @jeremy objects, I see no problem with renaming it to tools/fastai-nbstripout.py. Let’s make it as user-friendly as possible and working out of the box.

It’s still unclear what happened with stripping out not happening on commit for you.

And meanwhile I have been looking at git hooks - we may need to add a server-side pre-receive hook, to check for example that .ipynb files don’t have execution_count bit set as a simple validator - just looking for the first occurrence to make it fast.

The pre-receive hook is executed every time somebody uses git push to push commits to the repository…
https://www.atlassian.com/git/tutorials/git-hooks#server-side-hooks

and while we are at it, if we make a short script to enable repo-wide config, I suppose this should also be written in python and have .py ending, correct?

Now I’m getting this error when trying to add anything before a commit:

cleaning dev_nb/004c_mixed_precision.ipynb
f() { echo >&2 "cleaning $1"; tools/fastai-nbstripout; }; f 'dev_nb/004c_mixed_precision.ipynb': line 1: tools/fastai-nbstripout: Permission denied
error: external filter 'f() { echo >&2 "cleaning $1"; tools/fastai-nbstripout; }; f %f' failed 126
error: external filter 'f() { echo >&2 "cleaning $1"; tools/fastai-nbstripout; }; f %f' failed
fatal: dev_nb/004c_mixed_precision.ipynb: clean filter 'fastai-nbstripout-code' failed

It fails with:

line 1: tools/fastai-nbstripout: Permission denied

Is your script no longer executable? What’s the output of:

ls -l tools/fastai-nbstriptout

Also, why all of the sudden things stopped working when nothing under tools/ has changed? Were you working using a different setup until now?

I have no idea. At first I thought it was because I had transformed it into a .py file but even reverting back didn’t change anything. I solved the problem by recloning the repo.
Even with the line to trust .gifconfig, I still had to add the two blocks you sent earlier (for debugging) to get the notebooks stripped automatically.

On the fresh clone (w/o mods to the .gitconfig) what do you get when you run:

GIT_TRACE=1  git diff 

after you change a single notebook, so that there is a diff. it should show the invocations of the fastai-nbstripout script.

And the same on git commit:

GIT_TRACE=1  git commit

Thanks a lot for your help with this @stas :slight_smile:

For what it’s worth - on my first attempt at running tools/fastai-nbstripout I also hit permission denied (when I was adding the tool into a different project earlier this week).

I had to run sudo chmod 775 tools/fastai-nbstripout on it to fix the issue.