Fastai-nbstripout: stripping notebook outputs and metadata for git storage

Yup there’s no way to make that work in Windows AFAIK without creating a little .bat wrapper.

No it’s working even if the file doesn’t have the .py extension. Even when you want to execute it, python tools\fastai-nbstripout works fine (think I had a typo when I didn’t manage o make it work, probably a / instead of a ).

1 Like

Thank you, @sgugger!

I removed the check that didn’t work on windows, so you all can now switch to using the new tools/trust-origin-git-config. Your workflow can be updated to:

git clone https://github.com/fastai/fastai_v1
cd fastai_v1
tools/trust-origin-git-config

If you experience any problems please let me know.

1 Like

So on windows, last instruction should be

python tools\trust-origin-git-config

otherwise it works properly.

1 Like

what do you guys think of this tool?

seemed like a possible solution for dealing w/ jupyter notebooks in a collaborative / git environment

1 Like

Thank you for the feedback, @sgugger

  1. Would you still need to include python if the script has .py in it?
  2. When you use python as you have shown does it have to be \ or will / work as a path separator?
  1. Yes - Windows cmd doesn’t support script files as executable
  2. It needs \ on Windows cmd
2 Likes

Thank you, all, for your input. I have updated the docs to include a note on how to invoke this on windows. Hopefully it’ll be a smooth sailing from here on.

wrt the original issue with quoted filepath inside .git/config which lead to the creation of the new script, I submitted a bug report to the git dev list and it started a big discussion, which hasn’t yet resulted in any outcomes, but I trust something good will come out of it.

Thank you, Fred, for mentioning jupytext.

Looking through the demo it appears that it deletes everything but code, and that won’t work for what has been developing here - we do keep outputs and some other important notebook fields in the documentation notebooks. And down the road when code notebooks have been more or less completed it is possible that outputs will be stored again, while still deleting other notebook fields. i.e. we want to have that fine control over what gets stored under git, and jupytext takes it away.

I agree though that it’d be far easier if the stored format wasn’t JSON but some plain text - so merging/diffing would be much easier. Though nbdime handles the diff/merge quite well. Just make sure you have it installed and configured.

@stas could you tell me how to create a directory that doesn’t run stripout, or runs it with different params? I’d like to create a directory containing rendered notebooks for people to look at.

I think all you need to do is to move fastai_v1/.gitattributes to dev_nb if you want those notebooks not to be under dev_nb. I think we should do it anyway, since this setup is only relevant for things under dev_nb.

If, however, you want them as a subfolder under dev_nb, create .gitattributes in that new subfolder and inside you specify:

*.ipynb -filter

which will override its parent .gitattributes configuration. The leading - before filter means ‘Unset’.

However, why not use the .gitattributes from docs? You will end up with stripped notebooks which will keep the output. And no other irrelevant nb noise.

2 Likes

Maybe we could think of automatically checking if it is a valid JSON and if it is not, not even let the PR be merged if the json is not valid.

I know this kind of thing is possible within github, some projects use Travis CI, which maybe an overkill for fastai. Unfortunately, I don’t have much experience in this subject.

If you follow the developer install instructions, you will find:

tools/run-after-git-clone

which already takes care of doing the right thing. So your PR will be validated and done correctly by the filter that that script installs. You only need to run it once per git clone. For more details please see: this document.

1 Like

Do we have to preserve this metadata in the committed notebook?

  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },

Just had a PR where a user had their kernel spec set to something else, so it was conflicting with the default.

We could make fastai-nbstripout to remove that bit too, but this might not work if the user perhaps has a different default kernel and it might try to run it with R or another non-python kernel. If I remove it it works just fine on my jupyter notebook, but that’s the only kernel I have.

Or perhaps instead of stripping it we should set it exactly to the above setting, so if a user has a custom version, it will get rewritten on the way to git.

Also looking at the spec entry, the first and the last are names, so perhaps only:

   "language": "python",

is important to preserve? It actually doesn’t say anywhere which interpreter version the nb should be running with.

The user’s spec was:

"kernelspec": {
   "display_name": "Python (fastai-dev)",
   "language": "python",
   "name": "fastai-dev"
  }

so really, the names are just strings, and perhaps language is the only entry that needs to be preserved?

A post was split to a new topic: Port fastai-nbstripout to jupyter lab

Hello all! I am interested in contributing to the library and was trying to follow the setup guide under the git notes. I am on a Windows 10 computer (with GPU, I was able to run lesson 1 already) with Anaconda and Git for Windows. I was able to clone my forked repository. When I ran the setup script in Git Bash, I originally had problems as it was looking for python3, but changing the shebang to regular python fixed the issue. However, when creating a new branch with the git checkout -b command, I am receiving the following error:

error: external filter python tools\\fastai-nbstripout -d failed 2
error: external filter python tools\\fastai-nbstripout -d failed
fatal: docs_src/basic_data.ipynb: clean filter 'fastai-nbstripout-docs' failed```

How can I fix this error? 

Also, please let me know if this is the incorrect place to post this and I will move this post...

I moved your post to this dedicated thread, @ilovescience

You need to be using python3, is that the case?

And there must be more to the traceback than what you shared, can you paste the full error?

And best try to apply it w/o git, i.e. try:

python tools\fastai-nbstripout -d docs_src/basic_data.ipynb

and paste the output here. Also the output of:

python -m fastai.utils.show_install

Thank you.

I am using python3, it’s just that the python program is python.exe

And for the git command, that was the full error…

Running w/o git I get the same error:

D:\Anaconda3\python.exe: can't open file 'toolsfastai-nbstripout': [Errno 2] No such file or directory

I also get an error with the fastai command, probably because I am not working in my conda environment with the fastai installed:

Traceback (most recent call last):
  File "D:\Anaconda3\lib\site-packages\numpy\core\__init__.py", line 16, in <module>
    from . import multiarray
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "D:\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Github\fastai-fork\fastai\utils\show_install.py", line 2, in <module>
    from .collect_env import *
  File "D:\Github\fastai-fork\fastai\utils\collect_env.py", line 2, in <module>
    from ..imports.torch import *
  File "D:\Github\fastai-fork\fastai\imports\__init__.py", line 1, in <module>
    from .core import *
  File "D:\Github\fastai-fork\fastai\imports\core.py", line 2, in <module>
    import math, matplotlib.pyplot as plt, numpy as np, pandas as pd, random
  File "D:\Anaconda3\lib\site-packages\matplotlib\__init__.py", line 141, in <module>
    from . import cbook, rcsetup
  File "D:\Anaconda3\lib\site-packages\matplotlib\cbook\__init__.py", line 33, in <module>
    import numpy as np
  File "D:\Anaconda3\lib\site-packages\numpy\__init__.py", line 142, in <module>
    from . import add_newdocs
  File "D:\Anaconda3\lib\site-packages\numpy\add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "D:\Anaconda3\lib\site-packages\numpy\lib\__init__.py", line 8, in <module>
    from .type_check import *
  File "D:\Anaconda3\lib\site-packages\numpy\lib\type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "D:\Anaconda3\lib\site-packages\numpy\core\__init__.py", line 26, in <module>
    raise ImportError(msg)
ImportError:
Importing the multiarray numpy extension module failed.  Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control).  Otherwise reinstall numpy.

Original error was: DLL load failed: The specified module could not be found.

Running w/o git I get the same error:

D:\Anaconda3\python.exe: can't open file 'toolsfastai-nbstripout': [Errno 2] No such file or directory

Sorry, you will have to fix my suggestions, since I have little knowing of windows, so you probably need \\ in the filepath? I mean how would you run: python folder\\file on windows - do the same here where folder is tools and file is fastai-nbstripout.

So please try again the direct call with \\, or correct it to do the right thing, as you would on windows.

python tools\\fastai-nbstripout -d docs_src\\basic_data.ipynb

I also get an error with the fastai command, probably because I am not working in my conda environment with the fastai installed:

You mean that was the error when you tried to run:

python -m fastai.utils.show_install

so yes, you need to be in your conda environment.

When I run this command even in my environment, I get the same error, which is weird because I can run from fastai import * without any error, and even the lesson 1 notebook is running fine…

And the manual command worked fine… Does this mean I have to do everything manually? How will I create a new branch?