Following up from what Jeremy said about losing internet connection leading to losing work/progress on the notebooks, I’ve been wondering if there is a way to do away with them after the experimentation and prototyping stage.
Is there a way/ best practice solution to transferring code from Jupyter notebooks to scripts that can be run and log results every so often? Perhaps a way to “submit jobs” on servers?
If anyone has production experience with this stuff, I would also love to hear more about how the models get exposed/ deployed.
In the end, it’s just about writing a parser for Json files! You could mark all cells you want to keep with an arbitrary string at the beginning, check if the string is there for each cell, and if so store the rest in a file!
There might be several ways to do this, as mentioned above. But, given the ‘exploratory’ nature of notebook, it’s always going to be a hack. And, hacks are totally fine until you have to start maintaining them.
In the usual production cases, as you keep working on the fix/hack, it might end up being hacks on top of hacks. I’m not trying to discourage the idea here, but this is usually what happens on production when you bend a tool to do something else.
Personally, I’d use notebooks/REPL as space to do exploratory programming, sort of like a workbench where you try out things until it works.
Now, if you want to take this to production, it’ll save you a lot of headache if you manually extract the important bits on to a separate well packaged script. The extraction is just a one time thing, and also a good time to do housekeeping(code cleanup, documentation, naming, writing tests etc.) This has the benefit of running exactly the same thing every single time, since you don’t need to explore anymore.
I think it really helps to consider the exploratory vs non-exploratory spaces of running code differently. Lots of tools have been built to bridge this gap with automation, but in the end they all suffer in some ways.
And, once again these are just my opinions, so take it with a grain of salt.
So, I’ve been trying to go over some of the code that becomes available with published papers I’m yet to come across a Jupyter notebook (maybe I need to read more). It doesn’t look like a generated script from a notebook either. There seems to be a design choice I might be missing. Perhaps a good way to give command line arguments to the scripts or the way the results are logged. Are they doing it wrong? Is it too academic?
So far your process seems like a pro version of skim-implement-observe. I do feel that it might helpful if along with doing a mathematical explanation, you could show us how to navigate a repository that was released along with a paper. Or just point me in the right direction if that ask is too big.
Very few academics are taught to use jupyter notebooks - it’s only starting to see gradual adoption. Also, papers often need to try lots of different settings to show what works and what doesn’t (ablation studies) for which scripts tend to work better than notebooks. See the imdb_scripts folder we discussed yesterday for example scripts (they originally were based on notebooks, but over time Sebastian and I refactored them).
Thank you for sharing the link! Always have been a fan of sensible UI design and definitely a big fan of Bret Victor. I have to agree that research papers and single letter variable names are terrible interfaces for conveying ideas but that’s where the ideas are currently. That being said I have to commend Jeremy and Rachel on their approach that makes this (sometimes) intentionally esoteric medium more comprehensible.
Also, the second link won’t open for some reason. Could be something at my end.