[git] An easier jupyter notebook modification commit process?


(Stas Bekman) #81

All is good :slight_smile:
It will happen automatically once you do this, Michael.


(Stas Bekman) #82

This is indeed an excellent side effect of us using nbstripout, Francisco.


(Jeremy Howard) #83

I think you’re right. Python strings are copy-on-write.


(Nick brown) #84

Thank you all for your work on this, especially @stas.

I have wanted to solve exactly this problem for a repository of notebooks for quite a while and haven’t found anything that worked well for us. I saw Jeremy tweet about this a little while ago and finally had time to try it out today.

This was easy to understand, easy to setup in our repo, and worked perfectly out of the box.

The only thing I could possibly ask for at this point is figuring out a way to avoid each user having to run git config --local include.path '../.gitconfig' on checkout, but it sounds like that’s a security brick wall, which I get.

Thanks again!


(Stas Bekman) #85

Since we can’t avoid this step due to git’s security, I guess the only thing we can do to make it simpler is to make it into a script, like tools/trust-origin-git-config:

#/bin/sh
# XXX: first check we are inside a repo, then enable
git config --local include.path '../.gitconfig'

So the sequence would be:

git clone git@github.com:fastai/fastai_v1.git 
cd fastai_v1
tools/trust-origin-git-config

or perhaps it can be called 'tools/run-upon-clone`, but it’s less explicit about its function. Other ideas are welcome.

p.s. @jeremy, what do you think? should we add this? I suppose implementing it in python would be better so that it works on windows.


(Nick brown) #86

That makes sense, it’s quite similar to what I am thinking about doing for my project (a company-internal collection of exploratory notebooks and tools).

In our case, we have a couple of similar setup steps that users likely want to run upon clone. My current plan is to bundle them all up into a single tools/bootstrap script that they need to run once, after which they are all set for both this and our other config details.


#87

Has there been some changes in the git? I noticed the last notebook I pushed wasn’t stripped.


(Stas Bekman) #88

No change. Perhaps you made a new clone and forgot to tell git to trust .gitconfig? check .git/config and see whether you have this entry:

[include]
        path = ../.gitconfig

While we can’t do the trusting step on behalf of the user, perhaps we can use some git hook to validate that it’s configured and prevent git push without it?


#89

Not sure if I made a new clone, but I had rerun the line git config --local include.path ‘…/.gitconfig’ and I can see the config file does have the entry you mention. Still, two notebooks committed and pushed this morning without being stripped.


(Stas Bekman) #90

What happens if you edit one of the notebooks and run git diff with GIT_TRACE=1:

GIT_TRACE=1  git diff 

It will show whether fastai-nbstripout is run at all. Of course, git commit would be an even better test with the trace enabled.

Finally, you can debug the filenames it runs on with .gitconfig modified to print the filenames it runs on:

[filter "fastai-nbstripout-code"]
        clean = "f() { echo >&2 \"cleaning $1\"; tools/fastai-nbstripout; }; f %f"
        smudge = cat
        required = true
[diff "ipynb-code"]
        textconv = tools/fastai-nbstripout -t

and a similar change to -docs section if you are talking about docs notebook. It’s just the change to clean to print debug information.


#91

Funny, when adding that block to the config file the stripping worked again (and I saw clean name_of_the_modified_nb when running git add -A).


(Stas Bekman) #92

I am not sure what happens when you add the same entry twice, probably the last overrides it. I meant to replace the old entry with this one.

Please let us know if you discover the culprit of why it wasn’t working for you just before that.

And you can fix the committed notebooks with:

tools/fastai-nbstripout dev_nb/*ipynb dev_nb/experiments/*ipynb
tools/fastai-nbstripout -d docs/*ipynb

and then re-commit.


#93

tools doesn’t seem to work on my console (on windows with the anaconda prompt).
Just modified the notebooks I wanted to clean instead.


(Stas Bekman) #94

Please be more specific when you say something doesn’t work. How do you invoke it and what errors do you get?


#95

I meant that I don’t have this command:

'tools' is not recognized as an internal or external command,
operable program or batch file.

(Stas Bekman) #96

Sorry, I don’t know anything about windows shell.

Do you need to run:

tools\fastai-nbstripout

or:

python tools\fastai-nbstripout

instead?

So that means that you and Jeremy have verified that this works on windows when invoked through git, but not directly, correct? Let’s figure out how one runs this directly on windows and document this.

And what is windows with the anaconda prompt - what kind of shell is it running?


#97

I need fastai-nbstripout to be renamed fastai-nbstripout.py, then I can run

python tools/fastai-nbstripout.py

I have never used the script directly before, no, only checked it worked with git, that’s correct.
I think the anaconda shell is just wrapped around cmd.exe, so it’s the basic windows shell (with python and conda commands working). Not sure though.


(Stas Bekman) #98

Unless @jeremy objects, I see no problem with renaming it to tools/fastai-nbstripout.py. Let’s make it as user-friendly as possible and working out of the box.

It’s still unclear what happened with stripping out not happening on commit for you.

And meanwhile I have been looking at git hooks - we may need to add a server-side pre-receive hook, to check for example that .ipynb files don’t have execution_count bit set as a simple validator - just looking for the first occurrence to make it fast.

The pre-receive hook is executed every time somebody uses git push to push commits to the repository…
https://www.atlassian.com/git/tutorials/git-hooks#server-side-hooks

and while we are at it, if we make a short script to enable repo-wide config, I suppose this should also be written in python and have .py ending, correct?


#99

Now I’m getting this error when trying to add anything before a commit:

cleaning dev_nb/004c_mixed_precision.ipynb
f() { echo >&2 "cleaning $1"; tools/fastai-nbstripout; }; f 'dev_nb/004c_mixed_precision.ipynb': line 1: tools/fastai-nbstripout: Permission denied
error: external filter 'f() { echo >&2 "cleaning $1"; tools/fastai-nbstripout; }; f %f' failed 126
error: external filter 'f() { echo >&2 "cleaning $1"; tools/fastai-nbstripout; }; f %f' failed
fatal: dev_nb/004c_mixed_precision.ipynb: clean filter 'fastai-nbstripout-code' failed

(Stas Bekman) #100

It fails with:

line 1: tools/fastai-nbstripout: Permission denied

Is your script no longer executable? What’s the output of:

ls -l tools/fastai-nbstriptout

Also, why all of the sudden things stopped working when nothing under tools/ has changed? Were you working using a different setup until now?