Port fastai-nbstripout to jupyter lab

stas · February 14, 2019, 5:59am

It appears that fastai-nbstripout doesn’t work with jupyter lab - most likely its notebook format is slightly different than jupyter notebook. Since fastai-nbstripout manipulates the json content directly (to get 10+ speed gain over nbformat), it’s possible that it needs to be tweaked to match the other format as well. So if you’re looking for a relatively small simple project that just requires basic python understanding and list comprehension, this could be a neat little project for you. Thank you.

gokkulnath · February 28, 2019, 3:25am

HI @stas

I took a look into it. I was surprised that a Jupyter notebook is just a big json file with code put inside a hierarchy/tree of dicts.

Since the current implementation puts an empty list to all keys that are not required.
The way, i am thinking to approach the problem is , we can find the diff between the notebook run using jupyter notebook and Jupyter Lab. We can then configure the script to retain any essential information required i.e Identity additional Keys of Dict to retain.

Once the tool is in working state, In order to test it , i will try to run a few NB using jupyter lab and compare it with the baseline file (stripped version obtained from the repo directly) . Please provide your feedback on the approach or suggestion for alternatives approaches.

~Gokkul

stas · February 28, 2019, 5:33am

Thank you for wanting to work on this project, @gokkulnath!

Your proposed solution sounds reasonable.

Basically, fastai-nbstripout on a notebook run and saved by jupyter lab should produce an identical output as the one it produces on the same notebook run/saved in jupyter notebook.

I haven’t looked at the difference between those 2 formats, but most likely it’s the key to this problem. Probably let’s do the first step of identifying what the differences are and whether we can detect that the incoming nb is the result of one jupyter program or the other.

It’s also very possible that this won’t work. i.e. fastai-nbstripout needs to stay very fast.

And I remind that most likely jupyter lab nbs is still nbformat compliant - just saves it in a slightly different way. We just bypass nbformat to process it 10 times faster.

Also, if it helps this need came from this PR https://github.com/fastai/fastai/pull/1632, perhaps you could collaborate with odysseus0 since he was using GCP in first place and reported that our fastai-nbstripout wasn’t able to strip the nbs produced by GCP (which he said uses jupyter lab).

Sorry, just sharing my notes at this moment. I will be able to provide better feedback as we gather more data.

vova · March 28, 2019, 9:55am

Hey @stas, is there an existing evidence that fastai-nbstripout fails with lab notebooks?
I’ve created 2 simple notebooks one in jupyter notebook and one in jupyter lab (installed on Win10 + conda), and they look same inside and have same format version:

 "nbformat": 4,
 "nbformat_minor": 2

Also tried steps described in odysseus0’s PR: https://github.com/fastai/fastai/pull/1632#issuecomment-469958489
And I think it works as expected: I’ve just got conflict in line where code changes are really conflicting, but this seems to be far from “everything breaks apart”

stas · March 28, 2019, 3:39pm

Thank you for validating this, @vova. I don’t use lab myself, so I’m only the middle man here.

So first, can you git clone the fastai project, including running tools/run-after-git-clone and load some nb under docs_src in lab, edit it and save it and see that our custom git setup doesn’t break? i.e. that after saving the edited nb, you can git diff, commit, etc. Basically following this process: https://docs.fast.ai/gen_doc_main.html#process-for-contributing-to-the-docs

Second, @PegasusWithoutWinds (aka odysseus0@github), can you please re-check that what you reported here is still a problem? And if it is please let @vova know what is the status of this for you.

Perhaps it was an issue with an older lab and got changed in the recent versions.

Thank you both.

PegasusWithoutWinds · March 28, 2019, 3:54pm

It is definitely possible.

If you don’t mind, I will try to use Jupyter Lab when editing docs next time and see what happens. As you know, these kinds of problems happen more often in real practice and are hard to reproduce on the fly.

@vova Thank you for the verification effort!

vova · April 4, 2019, 4:16pm

I’ve tried that using following Lab version:

Lab: Version 0.35.4
Kernel: Python 3.6.7 (default, Oct 22 2018, 11:32:17), IPython 7.4.0

I think it works as expected: ipynb files under docs_src are cleaned from extra metadata and execution counters when checking git diff or making commit.
Outputs are not cleaned (because of -d option to fastai-nbstripout), so e.g. if 2 persons will make change to the same cell’s output (e.g. re-run notebook, produce slightly different matplotlib output image from the same cell and commit) it’s expected that merging such changes will result in a conflict (e.g. when you pull with automerge or explicitly merge branches). Conflicted file most likely won’t load in jupyter because JSON will be broken by conflict markers.

Yeah Let me know when you bump into problem. Even better if you’ll have Lab version details and/or broken .ipynb itself!

stas · April 4, 2019, 4:25pm

Thank you for your investigation, @vova. I will remove this thread for now from the index so it won’t be misleading to users.

@PegasusWithoutWinds, if you re-encounter this issue please comment here.

Thank you all!