Fastai-nbstripout: stripping notebook outputs and metadata for git storage

I have no idea. At first I thought it was because I had transformed it into a .py file but even reverting back didn’t change anything. I solved the problem by recloning the repo.
Even with the line to trust .gifconfig, I still had to add the two blocks you sent earlier (for debugging) to get the notebooks stripped automatically.

On the fresh clone (w/o mods to the .gitconfig) what do you get when you run:

GIT_TRACE=1  git diff 

after you change a single notebook, so that there is a diff. it should show the invocations of the fastai-nbstripout script.

And the same on git commit:

GIT_TRACE=1  git commit

Thanks a lot for your help with this @stas :slight_smile:

For what it’s worth - on my first attempt at running tools/fastai-nbstripout I also hit permission denied (when I was adding the tool into a different project earlier this week).

I had to run sudo chmod 775 tools/fastai-nbstripout on it to fix the issue.

Yes, thanks for helping @stas.
Your line doesn’t work in my shell but I’m guessing the windows version is

set GIT_TRACE=1
git diff

Running this after cloning the repo once again and changing one notebook makes a window of empty lines beginning with a ~ open.
git commit returned:

09:42:12.236323 exec-cmd.c:236          trace: resolved executable dir: C:/Users/Sylvain/Anaconda3/envs/fastai/Library/mingw64/bin
09:42:12.237322 git.c:415               trace: built-in: git commit
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
        modified:   dev_nb/004a_discriminative_lr.ipynb

no changes added to commit

git add -A returned:

09:42:44.064420 exec-cmd.c:236          trace: resolved executable dir: C:/Users/Sylvain/Anaconda3/envs/fastai/Library/mingw64/bin
09:42:44.066373 git.c:415               trace: built-in: git add -A
warning: LF will be replaced by CRLF in dev_nb/004a_discriminative_lr.ipynb.
The file will have its original line endings in your working directory.

and the notebook I changed wasn’t stripped.

wrt, executable scripts losing their exec bit on checkout I found this - see if it helps for preserving script’s executability on checkout. And this maybe.

And if either of these fixes the problem, please report what worked step by step, so that we could add it to the documentation.


wrt, trace, sorry I should have written please test git commit -a, or git add and then git commit - you showed the output the other way around where it won’t have run the filters anyway.

Though since git diff with trace didn’t show anything, it seems that somehow git ignores the configuration. e.g. on linux I get:

GIT_TRACE=1 git diff dev_nb/001a_nn_basics.ipynb
09:08:08.628998 git.c:344               trace: built-in: git diff dev_nb/001a_nn_basics.ipynb
09:08:08.629335 run-command.c:640       trace: run_command: unset GIT_PAGER_IN_USE; LESS=FRX LV=-c pager
09:08:08.629937 run-command.c:640       trace: run_command: tools/fastai-nbstripout
09:08:08.679979 run-command.c:640       trace: run_command: tools/fastai-nbstripout
09:08:08.727291 run-command.c:640       trace: run_command: cat
09:08:08.728347 run-command.c:640       trace: run_command: 'tools/fastai-nbstripout -t' /tmp/wZWBY0_001a_nn_basics.ipynb
09:08:08.779598 run-command.c:640       trace: run_command: 'tools/fastai-nbstripout -t' dev_nb/001a_nn_basics.ipynb
diff --git a/dev_nb/001a_nn_basics.ipynb b/dev_nb/001a_nn_basics.ipynb
index e170ba0..2de019e 100644
--- a/dev_nb/001a_nn_basics.ipynb
+++ b/dev_nb/001a_nn_basics.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# A basic training loop"
+    "# A basic training loop "
    ]
   },
   {

I don’t have a windows setup so the following are just ideas. Let’s try to experiment with various settings in .gitconfig, e.g.:

Rename to add .py to the script and change to:

[filter "fastai-nbstripout-code"]
        clean = tools/fastai-nbstripout.py
        smudge = cat
        required = true
[diff "ipynb-code"]
        textconv = tools/fastai-nbstripout.py -t

then, if it’s not working, try:

clean = python.exe tools/fastai-nbstripout.py

adjust to wherever your python.exe is if need be.

and finally a variation on a debug filter, which will use git’s internal shell:

clean = "f() { tools/fastai-nbstripout.py; }; f %f"

all these experiments are to be tested with git commit -a once you edited a code notebook, and best to keep GIT_TRACE=1 set.

Also have you tried gitforwindows.org? (install) - it suggests that it has a bash emulation. Or perhaps this is what you use already.

So I’m a bit at a loss. I’ve tried everything you mentioned and download git for windows then retried but it just looks like the content of .gitconfig is never executed (even if there is the [include] path = ‘…/.gitconfig’ in the .git/config file). Or if I copy this content inside the .git/config directly it’s ignored.

This is the result of a commit with the trace:

18:22:25.754605 exec-cmd.c:236          trace: resolved executable dir: C:/Users/Sylvain/Anaconda3/envs/fastai/Library/mingw64/bin
18:22:25.755622 git.c:415               trace: built-in: git commit -m test
18:22:25.993456 run-command.c:637       trace: run_command: git gc --auto
18:22:26.007809 exec-cmd.c:236          trace: resolved executable dir: C:/Users/Sylvain/Anaconda3/envs/fastai/Library/mingw64/bin
18:22:26.008788 git.c:415               trace: built-in: git gc --auto
[master f5e452b] test
 1 file changed, 18 insertions(+), 6 deletions(-)

I’m really surprised by the way it just stopped working…
Manually calling python tools\fastai-nbstripout dev_nb\001a_nn_basics.ipynb (for instance) works like a charm though…

It’s odd indeed. The trace shows that the filters config is ignored. And since you tried putting it directly into .git/config to no avail, this must have something to do with your git perhaps? What if you put that config into the global configuration, i.e. you add it to the global config file?

You said it worked until recently - so what has changed in your environment? Did you update git, bash, or any other components (or did they get updated automatically behind your back)?

You also said that the debug version I suggested you try - did work earlier. Does it still work?

The following might be useful - let’s see what config bits are being picked up where:

git config --list --show-origin

Also, you may want to use git diff to simplify your debug - since you can run it repeatedly, while tweaking configs. You should be getting a stripped out diff, and not a straight diff. If you get the latter, then the filter config is still not working. And GIT_TRACE=1 will also indicate when a filter is called during git diff.

I think I accidentally may have managed to reproduce your problem, @sgugger.

This works:

[include]
        path = ../.gitconfig

This doesn’t:

[include]
        path = '../.gitconfig'

Notice the ticks around the filename. When I have it there this custom config is silently (!) ignored.

git config --list --show-origin

proved to be very revealing.

Please check your .git/config - do you by chance have those ticks there, and what happens if you remove them.

I hope this resolves the issue.

And I’ll shortly commit an OS-agnostic script to do it right everywhere.

I have just committed a new script tools/trust-origin-git-config, usage:

$ tools/trust-origin-git-config -h
usage: trust-origin-git-config [-h] [-e] [-d] [-t]

optional arguments:
  -h, --help     show this help message and exit
  -e, --enable   Trust repo-wide .gitconfig (default action)
  -d, --disable  Distrust repo-wide .gitconfig
  -t, --test     Validate repo-wide .gitconfig config

$ tools/trust-origin-git-config -t
Executing: git config --list --show-origin
Check: repo's .gitconfig is not trusted, re-run with -e option to trust it

$ tools/trust-origin-git-config -e
Executing: git config --local include.path ../.gitconfig
Success: .gitconfig is now trusted

$ tools/trust-origin-git-config -t
Executing: git config --list --show-origin
Check: repo's .gitconfig is trusted

$ tools/trust-origin-git-config -d
Executing: git config --local --unset include.path
Success: .gitconfig is now distrusted

$ tools/trust-origin-git-config -t
Executing: git config --list --show-origin
Check: repo's .gitconfig is not trusted, re-run with -e option to trust it

You don’t have to use the -e option, it’s the default action.

Let me know whether it works.

And then whether you want it to be tools/trust-origin-git-config.py (and I will change the filter to .py as well).

I’m sure the messages can be polished more, but that’s good enough to make things working.

oh and please check whether -t option works, I was matching:

if ".git/../.gitconfig" in out and "filter.fastai-nbstripout-code" in out:

I’ve no idea whether it’ll be \\ as a file separator on windows or not. Or alternatively please just show me your output of:

git config --list --show-origin

So now the updated checkout process will become:

git clone git@github.com:fastai/fastai_v1.git
cd fastai_v1
tools/trust-origin-git-config
1 Like

That was it! Indeed by using the command

git config --local include.path ../.gitconfig

it worked properly again. I might have used the command with the ’ ’ at some point after a reclone and created this bug. Using your last script doesn’t work on windows though, tools\trust-origin-git-config or python tools\trust-origin-git-config both return an error:

Your git configuration drops executable bit on files
You may need to fix your git config with:
    git config --global core.fileMode false
and then re-checkout the repo files
Error: tools\fastai-nbstripout is not executable

In any case, thanks a lot for your help!

Excellent. Well, you didn’t create it. The current instructions are:

 git config --local include.path '../.gitconfig'

Except in bash, the command line quotes get removed, and apparently in your bash emulation they don’t. So the intention of the new script is to handle this transparently, avoiding such errors in the future.

I will see to report this to git developers, as it’s clearly a bug, when a configuration is silently ignored.

Please bear with me on this one. I’m trying to sort the executable issue that you folks have on windows. Perhaps @uptownnickbrown can help debugging this one.

It’s not enough to configure the content filters, we also need to make sure git will be able to run them. So I’ve added an early check:

filepath = os.path.join('tools', 'fastai-nbstripout')
st = os.stat(filepath)
if not bool(st.st_mode & stat.S_IXUSR):
    sys.exit(f"Error: {filepath} is not executable")

Can you please see why does it fail, if 'tools/‘fastai-nbstripout’ is executable that is. If it’s not executable, then it’s correct for it to fail. As git won’t be able to execute it.

Please, see the first part of this post and tell me which of the 2 solutions if any resolves this git checkout issue for good, so you don’t have to remember to ‘chmod a+x’ every time you check out the repo.

Thank you.

Tried the two variants of the post but none of them solved the problem and I get the same error message. Also tried what was advised in the error message, but it didn’t solve the problem either.

Also the behavior seems different than before: maybe I’m wrong but it seemed to me that the notebooks were stripped on my laptop and sent like this to the repo. Now they’re left as is on my laptop but sent stripped to the repo (just double-checked with a test notebook that I’ll remove later).

Hold on. I think those are two unrelated issues.

  1. the 2 suggested approaches are meant to keep the scripts under tools executable on checkout. i.e. to adjust the git configuration so that the executable bit doesn’t get stripped on clone/checkout. Can you please check that first? (i.e. don’t use the script, but look at the executability of the files as they are on unix).

  2. As I said, I know very little about windows world, so it’s very possible that the code I quoted in my last post only works on Unix. Perhaps we need to a different st.st_mode & stat.S_IXUSR check to validate whether a given file is executable on windows?

What I find confusing is that AFAIK chmod a+x filename in meaningless on Windows, yet, you’re talking about permission denied when run through git. So when tools/trust-origin-git-config is run, is it running through windows system, and in which case it’s guaranteed to fail. Or is it running through whatever unix-emulation you’re using, in which case it should work.

In other words, how do we code so that it’s always using unix-emulation?

Your working area shoudn’t have notebooks stripped out, unless you did it manually. It should only happen in the staging area on the way to the repo. So what you define as current behavior, this is as expected.

If I comment the code with this st.st_mode & stat.S_IXUSR test, the script still fails with this error:

raceback (most recent call last):
  File "tools\trust-origin-git-config", line 72, in <module>
    else:              trust_enable()
  File "tools\trust-origin-git-config", line 29, in trust_enable
    validate_script()
  File "tools\trust-origin-git-config", line 22, in validate_script
    result = subprocess.run(cmd.split(), shell=False, check=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "C:\Users\Sylvain\Anaconda3\envs\fastai\lib\subprocess.py", line 403, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Users\Sylvain\Anaconda3\envs\fastai\lib\subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "C:\Users\Sylvain\Anaconda3\envs\fastai\lib\subprocess.py", line 997, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

Though windows does manage to find the file and execute it when doing commits and everything, so maybe we should remove the valid_script() function?

I have been reading this, so it’s clear that we are dealing with different systems at play, and I still am not sure which one you end up using when invoking tools/trust-origin-git-config. If it’s run through git-bash, then you’d think it should handle this correctly, but clearly it’s not.

So let’s back up.

  1. Both you and Nick reported that files under tools aren’t executable on checkout, even though they should be. I search for this symptom and found a few possible solutions. I still don’t know whether any of those work or not. It’s important to stress out again that I just want you to validate the original problem you and Nick reported (and don’t use the new script for that). The goal is to run git clone and have the files under tools executable by git. So let’s resolve that separately as it’s a separate issue. And let’s do that first before moving to #2.

  2. We can surely drop the valid_script() function, but I’m trying to make it work to deal with issue #1, so that others on windows will know what to do if they get the same issue. So if you do have an issue with 'permission denied` as both you and Nick reported, can you make that file permission bit check code work on your setup? So that it complains if you get ‘permission denied’ when using git and not otherwise? Please, ignore the suggested by script solution for now, it was only a guess as I’m still waiting on understanding #1. The function works just fine on linux. Here is the expected behavior:


$ chmod a-x tools/fastai-nbstripout
$ tools/trust-origin-git-config -e
Your git configuration drops executable bit on files
You may need to fix your git config with:
    git config --global core.fileMode false
and then re-checkout the repo files
Error: tools/fastai-nbstripout is not executable

$ chmod a+x tools/fastai-nbstripout
$ tools/trust-origin-git-config -e
Executing: git config --local include.path ../.gitconfig
Success: .gitconfig is now trusted

I hope that you can see that the purpose of this check is to validate that the script will work under git, since both you and Nick discovered just configuring it wasn’t enough - you were getting permission denied problem.

If you change the docs to use double-quotes, it should work fine in both places AFAIK. I don’t believe the Windows cmd shell uses single quotes.

An alternative is to instruct Windows devs to use Windows Subsystem for Linux (WSL) for all their git stuff. @Sylvain have you considered this? Might make your life easier…

The script will solve this transparently for all.

I just need help by proxy to sort out the validation part of it. If @sgugger is very busy, perhaps @uptownnickbrown can help us with this. Thank you.

And of course if you find a solid windows way of doing this that doesn’t mess with script permissions and requires no workarounds and validations, and all developers use it, then it’ll not be needed, as it will just work.

1 Like

None of the solutions before changed the fact that the files in tools aren’t recognized as executable in my shell (e.g. trying tools\fastai-nbstripout fails whereas python tools\fastai-nbstripout works). Yet the stripping still happens so inside git the script is properly executed.

OK, I will drop that check then. Thank you for experimenting with that, @sgugger.

And please tell me whether we need to change the scripts to .py or whether it’s working as is now.