[git] An easier jupyter notebook modification commit process?

(Jeremy Howard (Admin)) #121

Yup there’s no way to make that work in Windows AFAIK without creating a little .bat wrapper.


No it’s working even if the file doesn’t have the .py extension. Even when you want to execute it, python tools\fastai-nbstripout works fine (think I had a typo when I didn’t manage o make it work, probably a / instead of a ).

(Stas Bekman) #123

Thank you, @sgugger!

I removed the check that didn’t work on windows, so you all can now switch to using the new tools/trust-origin-git-config. Your workflow can be updated to:

git clone https://github.com/fastai/fastai_v1
cd fastai_v1

If you experience any problems please let me know.


So on windows, last instruction should be

python tools\trust-origin-git-config

otherwise it works properly.

(Fred Monroe) #125

what do you guys think of this tool?

seemed like a possible solution for dealing w/ jupyter notebooks in a collaborative / git environment

(Stas Bekman) #126

Thank you for the feedback, @sgugger

  1. Would you still need to include python if the script has .py in it?
  2. When you use python as you have shown does it have to be \ or will / work as a path separator?

(Jeremy Howard (Admin)) #127
  1. Yes - Windows cmd doesn’t support script files as executable
  2. It needs \ on Windows cmd

(Stas Bekman) #128

Thank you, all, for your input. I have updated the docs to include a note on how to invoke this on windows. Hopefully it’ll be a smooth sailing from here on.

wrt the original issue with quoted filepath inside .git/config which lead to the creation of the new script, I submitted a bug report to the git dev list and it started a big discussion, which hasn’t yet resulted in any outcomes, but I trust something good will come out of it.

(Stas Bekman) #129

Thank you, Fred, for mentioning jupytext.

Looking through the demo it appears that it deletes everything but code, and that won’t work for what has been developing here - we do keep outputs and some other important notebook fields in the documentation notebooks. And down the road when code notebooks have been more or less completed it is possible that outputs will be stored again, while still deleting other notebook fields. i.e. we want to have that fine control over what gets stored under git, and jupytext takes it away.

I agree though that it’d be far easier if the stored format wasn’t JSON but some plain text - so merging/diffing would be much easier. Though nbdime handles the diff/merge quite well. Just make sure you have it installed and configured.

(Jeremy Howard (Admin)) #130

@stas could you tell me how to create a directory that doesn’t run stripout, or runs it with different params? I’d like to create a directory containing rendered notebooks for people to look at.

(Stas Bekman) #131

I think all you need to do is to move fastai_v1/.gitattributes to dev_nb if you want those notebooks not to be under dev_nb. I think we should do it anyway, since this setup is only relevant for things under dev_nb.

If, however, you want them as a subfolder under dev_nb, create .gitattributes in that new subfolder and inside you specify:

*.ipynb -filter

which will override its parent .gitattributes configuration. The leading - before filter means ‘Unset’.

However, why not use the .gitattributes from docs? You will end up with stripped notebooks which will keep the output. And no other irrelevant nb noise.

(Fred Guth) #132

Maybe we could think of automatically checking if it is a valid JSON and if it is not, not even let the PR be merged if the json is not valid.

I know this kind of thing is possible within github, some projects use Travis CI, which maybe an overkill for fastai. Unfortunately, I don’t have much experience in this subject.

(Stas Bekman) #133

If you follow the developer install instructions, you will find:


which already takes care of doing the right thing. So your PR will be validated and done correctly by the filter that that script installs. You only need to run it once per git clone. For more details please see: this document.

(Stas Bekman) #134

Do we have to preserve this metadata in the committed notebook?

  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"

Just had a PR where a user had their kernel spec set to something else, so it was conflicting with the default.

We could make fastai-nbstripout to remove that bit too, but this might not work if the user perhaps has a different default kernel and it might try to run it with R or another non-python kernel. If I remove it it works just fine on my jupyter notebook, but that’s the only kernel I have.

Or perhaps instead of stripping it we should set it exactly to the above setting, so if a user has a custom version, it will get rewritten on the way to git.

Also looking at the spec entry, the first and the last are names, so perhaps only:

   "language": "python",

is important to preserve? It actually doesn’t say anywhere which interpreter version the nb should be running with.

The user’s spec was:

"kernelspec": {
   "display_name": "Python (fastai-dev)",
   "language": "python",
   "name": "fastai-dev"

so really, the names are just strings, and perhaps language is the only entry that needs to be preserved?

(Stas Bekman) split this topic #135

A post was split to a new topic: Port fastai-nbstripout to jupyter lab