Port fastai-nbstripout to jupyter lab


(Stas Bekman) #1

It appears that fastai-nbstripout doesn’t work with jupyter lab - most likely its notebook format is slightly different than jupyter notebook. Since fastai-nbstripout manipulates the json content directly (to get 10+ speed gain over nbformat), it’s possible that it needs to be tweaked to match the other format as well. So if you’re looking for a relatively small simple project that just requires basic python understanding and list comprehension, this could be a neat little project for you. Thank you.


Dev Projects Index
Automate hub installation
Fastai-nbstripout: stripping notebook outputs and metadata for git storage
(Gokkul Nath T S) #2

HI @stas

I took a look into it. I was surprised that a Jupyter notebook is just a big json file with code put inside a hierarchy/tree of dicts.

Since the current implementation puts an empty list to all keys that are not required.
The way, i am thinking to approach the problem is , we can find the diff between the notebook run using jupyter notebook and Jupyter Lab. We can then configure the script to retain any essential information required i.e Identity additional Keys of Dict to retain.

Once the tool is in working state, In order to test it , i will try to run a few NB using jupyter lab and compare it with the baseline file (stripped version obtained from the repo directly) . Please provide your feedback on the approach or suggestion for alternatives approaches.

~Gokkul


(Stas Bekman) #3

Thank you for wanting to work on this project, @gokkulnath!

Your proposed solution sounds reasonable.

Basically, fastai-nbstripout on a notebook run and saved by jupyter lab should produce an identical output as the one it produces on the same notebook run/saved in jupyter notebook.

I haven’t looked at the difference between those 2 formats, but most likely it’s the key to this problem. Probably let’s do the first step of identifying what the differences are and whether we can detect that the incoming nb is the result of one jupyter program or the other.

It’s also very possible that this won’t work. i.e. fastai-nbstripout needs to stay very fast.

And I remind that most likely jupyter lab nbs is still nbformat compliant - just saves it in a slightly different way. We just bypass nbformat to process it 10 times faster.

Also, if it helps this need came from this PR https://github.com/fastai/fastai/pull/1632, perhaps you could collaborate with odysseus0 since he was using GCP in first place and reported that our fastai-nbstripout wasn’t able to strip the nbs produced by GCP (which he said uses jupyter lab).

Sorry, just sharing my notes at this moment. I will be able to provide better feedback as we gather more data.