Fastai-nbstripout: stripping notebook outputs and metadata for git storage

If you follow the developer install instructions, you will find:

tools/run-after-git-clone

which already takes care of doing the right thing. So your PR will be validated and done correctly by the filter that that script installs. You only need to run it once per git clone. For more details please see: this document.

1 Like

Do we have to preserve this metadata in the committed notebook?

  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },

Just had a PR where a user had their kernel spec set to something else, so it was conflicting with the default.

We could make fastai-nbstripout to remove that bit too, but this might not work if the user perhaps has a different default kernel and it might try to run it with R or another non-python kernel. If I remove it it works just fine on my jupyter notebook, but that’s the only kernel I have.

Or perhaps instead of stripping it we should set it exactly to the above setting, so if a user has a custom version, it will get rewritten on the way to git.

Also looking at the spec entry, the first and the last are names, so perhaps only:

   "language": "python",

is important to preserve? It actually doesn’t say anywhere which interpreter version the nb should be running with.

The user’s spec was:

"kernelspec": {
   "display_name": "Python (fastai-dev)",
   "language": "python",
   "name": "fastai-dev"
  }

so really, the names are just strings, and perhaps language is the only entry that needs to be preserved?

A post was split to a new topic: Port fastai-nbstripout to jupyter lab

Hello all! I am interested in contributing to the library and was trying to follow the setup guide under the git notes. I am on a Windows 10 computer (with GPU, I was able to run lesson 1 already) with Anaconda and Git for Windows. I was able to clone my forked repository. When I ran the setup script in Git Bash, I originally had problems as it was looking for python3, but changing the shebang to regular python fixed the issue. However, when creating a new branch with the git checkout -b command, I am receiving the following error:

error: external filter python tools\\fastai-nbstripout -d failed 2
error: external filter python tools\\fastai-nbstripout -d failed
fatal: docs_src/basic_data.ipynb: clean filter 'fastai-nbstripout-docs' failed```

How can I fix this error? 

Also, please let me know if this is the incorrect place to post this and I will move this post...

I moved your post to this dedicated thread, @ilovescience

You need to be using python3, is that the case?

And there must be more to the traceback than what you shared, can you paste the full error?

And best try to apply it w/o git, i.e. try:

python tools\fastai-nbstripout -d docs_src/basic_data.ipynb

and paste the output here. Also the output of:

python -m fastai.utils.show_install

Thank you.

I am using python3, it’s just that the python program is python.exe

And for the git command, that was the full error…

Running w/o git I get the same error:

D:\Anaconda3\python.exe: can't open file 'toolsfastai-nbstripout': [Errno 2] No such file or directory

I also get an error with the fastai command, probably because I am not working in my conda environment with the fastai installed:

Traceback (most recent call last):
  File "D:\Anaconda3\lib\site-packages\numpy\core\__init__.py", line 16, in <module>
    from . import multiarray
ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "D:\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Github\fastai-fork\fastai\utils\show_install.py", line 2, in <module>
    from .collect_env import *
  File "D:\Github\fastai-fork\fastai\utils\collect_env.py", line 2, in <module>
    from ..imports.torch import *
  File "D:\Github\fastai-fork\fastai\imports\__init__.py", line 1, in <module>
    from .core import *
  File "D:\Github\fastai-fork\fastai\imports\core.py", line 2, in <module>
    import math, matplotlib.pyplot as plt, numpy as np, pandas as pd, random
  File "D:\Anaconda3\lib\site-packages\matplotlib\__init__.py", line 141, in <module>
    from . import cbook, rcsetup
  File "D:\Anaconda3\lib\site-packages\matplotlib\cbook\__init__.py", line 33, in <module>
    import numpy as np
  File "D:\Anaconda3\lib\site-packages\numpy\__init__.py", line 142, in <module>
    from . import add_newdocs
  File "D:\Anaconda3\lib\site-packages\numpy\add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "D:\Anaconda3\lib\site-packages\numpy\lib\__init__.py", line 8, in <module>
    from .type_check import *
  File "D:\Anaconda3\lib\site-packages\numpy\lib\type_check.py", line 11, in <module>
    import numpy.core.numeric as _nx
  File "D:\Anaconda3\lib\site-packages\numpy\core\__init__.py", line 26, in <module>
    raise ImportError(msg)
ImportError:
Importing the multiarray numpy extension module failed.  Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control).  Otherwise reinstall numpy.

Original error was: DLL load failed: The specified module could not be found.

Running w/o git I get the same error:

D:\Anaconda3\python.exe: can't open file 'toolsfastai-nbstripout': [Errno 2] No such file or directory

Sorry, you will have to fix my suggestions, since I have little knowing of windows, so you probably need \\ in the filepath? I mean how would you run: python folder\\file on windows - do the same here where folder is tools and file is fastai-nbstripout.

So please try again the direct call with \\, or correct it to do the right thing, as you would on windows.

python tools\\fastai-nbstripout -d docs_src\\basic_data.ipynb

I also get an error with the fastai command, probably because I am not working in my conda environment with the fastai installed:

You mean that was the error when you tried to run:

python -m fastai.utils.show_install

so yes, you need to be in your conda environment.

When I run this command even in my environment, I get the same error, which is weird because I can run from fastai import * without any error, and even the lesson 1 notebook is running fine…

And the manual command worked fine… Does this mean I have to do everything manually? How will I create a new branch?

Any luck with this version of the same?

python -c 'import fastai.utils.collect_env; fastai.utils.collect_env.show_install(1)'

And the manual command worked fine… Does this mean I have to do everything manually? How will I create a new branch?

No, we just need to sort out what the problem is.

First, you can disable the filter, until we sort it out:

python tools\trust-origin-git-config -d

and only apply the stripout to ipynb files before committing if you plan to contribute to the notebooks. So it’s no problem - just not automated.


Now, someone else would be probably more qualified to help you out there, since I don’t know windows. So unless this happens please bear with me. Going back to the git filters that you have a problem with. This:

python tools\trust-origin-git-config -e

generates .gitconfig with the instructions to git how to run the stripout filter.

In the past we had issues with this setup on windows, it had to do with the number of \ we had to put in - madness! I thought it was figured out. I guess it is not.

So if the manual invocation of the script works, it’s all about figuring out why the autogenerated .gitconfig isn’t working for you. So have a look inside, perhaps tweak it and see if you can find if it’s the number of \ in the path that’s the problem. Notice that python tools\trust-origin-git-config -e enables the filters and -d disables them.

I replaced all the backslashes to a single forward slash and it worked and the git checkout commands work and now I cannot replicate the original error when changing back to 4 backslashes! So I will continue the git Notes tutorial and see how it goes, and let you know if the error returns.

Also, fastai.utils command now works. I had missed a folder that needed to be added to my PATH. When I run the command I get the following:

=== Software ===
python        : 3.7.1
fastai        : 1.0.45
fastprogress  : 0.1.19
torch         : 1.0.0
torch cuda    : 9.0 / is available
torch cudnn   : 7005 / is enabled

=== Hardware ===
torch devices : 1
  - gpu0      : GeForce GTX 1050 Ti

=== Environment ===
platform      : Windows-10-10.0.17763-SP0
conda env     : base
python        : D:\Anaconda3\python.exe
sys.path      :
D:\Anaconda3\python37.zip
D:\Anaconda3\DLLs
D:\Anaconda3\lib
D:\Anaconda3
D:\Anaconda3\lib\site-packages
D:\Anaconda3\lib\site-packages\win32
D:\Anaconda3\lib\site-packages\win32\lib
D:\Anaconda3\lib\site-packages\Pythonwin
no nvidia-smi is found

Good. So I suppose you have a real bash on windows. I think whoever we made it work for last on windows, had no bash, just the command prompt. How do we tell bash env on windows - what is your SHELL environment variable - If it’s on windows? On linux it is usually set to /bin/bash.

and then can you adjust tools\trust-origin-git-config so that it generates the format that you found working, it’s most likely just this part:

is_windows = hasattr(sys, 'getwindowsversion')
cmd = "tools/fastai-nbstripout" if not is_windows else r"python tools\\\\fastai-nbstripout"

So it probably needs to be changed to:

is_windows = hasattr(sys, 'getwindowsversion')
if SHELL is in os.environ: is_windows = False
cmd = "tools/fastai-nbstripout" if not is_windows else r"python tools\\\\fastai-nbstripout"

so we pretend we are on unix if it’s bash. Except, please figure out what that condition should be so that it works for you. This is untested.

Will probably make it then into is_bash_like flag, so it’ll be more intuitive.

Thanks.


edit:

I searched on SO a bit, so this might be all you need to do the right conditional:

  1. echo $BASH # should be set for BASH

  2. What type of bash:
    https://stackoverflow.com/a/33828925/9201239
    Bash sets the shell variable OSTYPE. From man bash:

Automatically set to a string that describes the operating system on which bash is executing.

case "$OSTYPE" in
  linux*)   echo "Linux / WSL" ;;
  darwin*)  echo "Mac OS" ;; 
  win*)     echo "Windows" ;;
  msys*)    echo "MSYS / MinGW / Git Bash" ;;
  cygwin*)  echo "Cygwin" ;;
  bsd*)     echo "BSD" ;;
  solaris*) echo "Solaris" ;;
  *)        echo "unknown: $OSTYPE" ;;
esac

So you can probably get the logic right from BASH and OSTYPE env vars.

And I think we are dealing multiple types of shells on windows:

  1. Windows Bash (/ sep)
  2. Git Bash / or \ sep?
  3. not Bash /(\ sep)
  4. Cygwin bash

I guess yours is the first one.

[Resurrecting this thread on @stas request]

hi @ilovescience,

I have been able to clone (with tools/run-after-git-clone) and create a new branch. See the pictures below.

Can you please pull down the latest source and see if you are getting any errors?
If so - can you please tell me (a) the errors and (b) the exact steps that led to the errors.

Happy to ‘sit’ with you till this is resolved. Win10 is my primary environment for DL as well.

On cmd.exe

On ubuntu.exe

I guess the main question - does Bash on win10 behaves likes unix (uses unix / paths or not). It’s really just figuring out whether is_windows in tools\trust-origin-git-config (see my last comment) should be True or bash/win10 or not.

Hello Partho,

Thanks for the response… I have not had much time to get back to working on this, but I was using Git Bash instead of the Command Prompt so that could have been the reason why the program was confused, because I think Git Bash behaves like Unix but Command Prompt does not…

When I get a chance to work on this again, I will definitely let you know if there are any errors with this aspect and with the development process.

2 Likes

@stas w.r.t. testing is_windows here is what i find:

environment is_windows “/” or “\”? comments
cmd.exe/powershell true “\” Native windows command prompts.
bash in windows subsystem for linux false “/” WLS is a true emulation layer. executables inside are true linux executables running in a true linux environment
bash in mingw64 true ? i dont understand the how this works. it looks like a simulation layer for linux binaries. i am very curious to know its usage given wsl super hi-fidelity emulation of linux

In short I believe the trust-origin-git-config script is detecting the environment accurately.

I’ll make the necessary fixes if there are reports to the contrary.

Looks good, @partho.

But I’m not sure how you made the entry for “bash in windows” to be is_windows == False
It’s derived from:
is_windows = hasattr(sys, 'getwindowsversion')
so probably it should be fixed to be false in that case, no?

or are you saying under bash/windows hasattr(sys, 'getwindowsversion') returns False? It pretends to be unix?

are you saying under bash/windows hasattr(sys, 'getwindowsversion') returns False? It pretends to be unix?

Yes it returns false & yes it ‘pretends’ to be linux**.

If I understand your question correctly, you are asking about WSL.
Bash (and all other processes) when run under WSL get to see a linux environment (the OS APIs, filesystem etc.). They are binary identical to ones found on linux. For most purposes they don’t know that the linux environment is actually being emulated inside an windows OS. Inception movie type thingy going on in there.

Of course that emulation layer does not have 100% fidelity yet. The biggest missing piece relevant to us is CPU/CUDA access.

**linux here stands for all distros of linuix supported by WLS.

Thank you for the detailed explanation, @partho!

What about git bash on windows? I understand that there is that variation too.

That was the 3rd row in the matrix above

environment is_windows “/” or “\”? comments
bash in mingw64 true ? i dont understand the how this works. it looks like a simulation layer for linux binaries. i am very curious to know its usage given wsl super hi-fidelity emulation of linux

I don’t have much experience with this one and I am not sure how well this works compared to the other 2. Given we have WSL, I would be very surprised if any serious work is done through this.

Do you want this to a be a supported environment as well?

Ah, that’s git bash. Understood. Thank you.

Probably not until we get someone wanting it, perhaps it just would never happen so why waste your time.

1 Like