Jupyter Notebooks to Scripts?

snagpaul · April 3, 2018, 4:04am

Following up from what Jeremy said about losing internet connection leading to losing work/progress on the notebooks, I’ve been wondering if there is a way to do away with them after the experimentation and prototyping stage.

Is there a way/ best practice solution to transferring code from Jupyter notebooks to scripts that can be run and log results every so often? Perhaps a way to “submit jobs” on servers?

If anyone has production experience with this stuff, I would also love to hear more about how the models get exposed/ deployed.

rob · April 3, 2018, 4:51am

Is there a way/ best practice solution to transferring code from Jupyter notebooks to scripts

I think you could do this with a Jupyter extension. One may already exist, or you could write one.

that can be run and log results every so often? Perhaps a way to “submit jobs” on servers?

I’m not sure what you mean by submitting jobs. Exporting code from Jupyter, and actually running the code, would be different tasks for different tools, in my view.

I had an idea to make one that dumps all uncollapsed cells into a file, though I didn’t get around to creating it.

I also want to make one that, when I press shift-enter, skips to the next uncollapsed code cell, rather than going into markdown or collapsed cells.

alexrigler · April 3, 2018, 4:59am

You might be interested in checking out Paperspace Gradient as I found it quick to get going with for fast.ai projects: https://www.paperspace.com/gradient

You can submit jobs to a machine or cluster and save whatever outputs you want to an artifacts directory. It also has a GUI. Check their docs.

Under the hood Paperspace uses docker containers which are also worth reading up on. Here’s a good overview: https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5

Docker and containers may be overkill for testing and research but can help when deploying stuff. Lots of great tooling is available that helps reduce the learning curve, e.g https://multithreaded.stitchfix.com/blog/2018/02/22/flotilla/

Hope that is useful, sorry for the link deluge

DavideBoschetto · April 3, 2018, 5:19am

In the end, it’s just about writing a parser for Json files! You could mark all cells you want to keep with an arbitrary string at the beginning, check if the string is there for each cell, and if so store the rest in a file!

janimo · April 3, 2018, 7:00am

Maybe this helps?

https://nbconvert.readthedocs.io/en/latest/customizing.html#Converting-a-notebook-to-an-(I)Python-script-and-printing-to-stdout

suvash · April 3, 2018, 10:03am

There might be several ways to do this, as mentioned above. But, given the ‘exploratory’ nature of notebook, it’s always going to be a hack. And, hacks are totally fine until you have to start maintaining them.

In the usual production cases, as you keep working on the fix/hack, it might end up being hacks on top of hacks. I’m not trying to discourage the idea here, but this is usually what happens on production when you bend a tool to do something else.

Personally, I’d use notebooks/REPL as space to do exploratory programming, sort of like a workbench where you try out things until it works.

Now, if you want to take this to production, it’ll save you a lot of headache if you manually extract the important bits on to a separate well packaged script. The extraction is just a one time thing, and also a good time to do housekeeping(code cleanup, documentation, naming, writing tests etc.) This has the benefit of running exactly the same thing every single time, since you don’t need to explore anymore.

I think it really helps to consider the exploratory vs non-exploratory spaces of running code differently. Lots of tools have been built to bridge this gap with automation, but in the end they all suffer in some ways.

And, once again these are just my opinions, so take it with a grain of salt.

jeremy · April 3, 2018, 2:35pm

FYI in the File menu of Jupyter notebook there’s an option to download the current notebook as a python script.

snagpaul · April 3, 2018, 5:04pm

So, I’ve been trying to go over some of the code that becomes available with published papers I’m yet to come across a Jupyter notebook (maybe I need to read more). It doesn’t look like a generated script from a notebook either. There seems to be a design choice I might be missing. Perhaps a good way to give command line arguments to the scripts or the way the results are logged. Are they doing it wrong? Is it too academic?

So far your process seems like a pro version of skim-implement-observe. I do feel that it might helpful if along with doing a mathematical explanation, you could show us how to navigate a repository that was released along with a paper. Or just point me in the right direction if that ask is too big.

snagpaul · April 3, 2018, 5:07pm

I do agree with the manual extraction. I wonder if there is a way to structure that somehow. Maybe it’s a matter of taste? Or it’s a standard?

suvash · April 3, 2018, 5:44pm

Definitely not. “Whatever works good enough for you, at that point in time.” is more like it.

jeremy · April 3, 2018, 6:40pm

Very few academics are taught to use jupyter notebooks - it’s only starting to see gradual adoption. Also, papers often need to try lots of different settings to show what works and what doesn’t (ablation studies) for which scripts tend to work better than notebooks. See the imdb_scripts folder we discussed yesterday for example scripts (they originally were based on notebooks, but over time Sebastian and I refactored them).

snagpaul · April 3, 2018, 7:39pm

Thanks!

digitalspecialists · April 6, 2018, 4:58am

So here is an interesting related article https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/?single_page=true

And a linked notebook explaining the detection of gravitational waves. https://hub.mybinder.org/user/losc-tutorial-l-_event_tutorial-vruzbvh0/tree

snagpaul · April 7, 2018, 5:34am

Thank you for sharing the link! Always have been a fan of sensible UI design and definitely a big fan of Bret Victor. I have to agree that research papers and single letter variable names are terrible interfaces for conveying ideas but that’s where the ideas are currently. That being said I have to commend Jeremy and Rachel on their approach that makes this (sometimes) intentionally esoteric medium more comprehensible.

Also, the second link won’t open for some reason. Could be something at my end.

davidc · September 21, 2018, 1:07pm

You can convert the script with:

jupyter nbconvert --to script lesson4-imdb.ipynb

This will create a file lesson4-imdb.py in the same directory.

It is possible to run that script as is with ipython as fiollows:

ipython --pylab auto lesson4-imdb.py

That did not work for me in a terminal so I removed all lines from the file containing get_ipython and ran the script normally

python  lesson4-imdb.py

This produces minimal output good idea to add logging. I will report back if I get the