[fastpages] GitHub Pages Blog Using Nbdev

psychemedia · February 28, 2020, 4:37pm

Picking up on a Twitter thread, some comments around the “fastpages supports really easy Jupyter blogging” effusiveness on Twitter.

(Note this isn’t meant to be hostile, it’s meant to be usefully critical

For any seasoned Github user and developer who’s also been responsible for maintaining documentation sites using Jekyll, fastpages “just” requires folk to use Github and Jekyll style publishing to publish a blog site from notebook files and markdown docs.

For anyone familiar with Github, git, and Jekyll publishing, the fastpages automation simplifies some of the faff required in getting that stuff working. (Other approaches, such as Jupyter Book, ipypublish and nbsphinx offer related publishing routes but less hype. A proper comparison of all the approaches might be useful…)

So if you’re familiar with Github and Jekyll, the benefits are quite possibly both clear and enticing. But if you aren’t a Github user or a Jekyll user, things are pretty much as opaque as every they were.

The fastpages mechanic of generating a PR on the first commit generated when cloning the template repo is really neat, and an idea I’ll likely steal. But for a novice, without mental model of how Github works, this doesn’t in and of itself make things that much easier. The naive user is faced with a complex UI, using complex jargon, and probably doesn’t know where to go looking for the PR, how to handle it, what it means when they do handle it, etc etc.

The file listing on the master home page you’re faced with when cloning the repo is also intimidating. There are a lot of files, there’s lots of directory names starting with scary underscores, lots of .whatever hidden files. That’s fine if you’re creating a workflow that’s “easy” for folk who are happy with all this stuff, but if the claim is that this is an “easy route into blogging with Jupyter” in general, it isn’t.

One of the attractive features of the Jupyter notebook UI and infrastructure is that someone with little technical knowledge on the command line can quickly start using magics and high level commands, a line at a time, to get stuff done. Just because someone can plot a chart a from a pandas data frame populate from a loaded in CSV file doesn’t necessarily mean they know how to set up the Jupyterhub server they’re actually a user of, nor even how to install pandas into the environment they’re using. As a user, why should they? The same goes for their familiarity, or otherwise, with Github and Jekyll. (By the by, it’s probably best to leave the “but they ought to…” arguments aside…)

I’m all for folk developing skills, but onboarding is really hard. And oftentimes, when trying to persuade people to adopt new tech in conservative institutions, you only get infrequent opportunities to entice them in. If you claim something is easy, that you “just” this and that, then watch their face as confusion and terror reigns, and you’ve lost your conversion opportunity. They won’t try again.

To make things really easy means taking things much slower. Cloning the repo and showing a clean page with a very simple set of instructions, and all the scary stuff hidden in branches, provides an opportunity for generating an easy way in. The initial readme could provide a set of very clear instructions about setting up tokens etc, along with why they’re necessary (eg Stephen Downes had a go at simplifying them here).

Things would also be simple if the all the Jekyll scaffolding were hidden away somewhere, and the user could just slowly introduce things into the top level directory, the homepage for their blog source files, with all the scaffolding hidden away and built on via branches.

This level of simplicity may or may not be desirable for a (semi-)professional, if ad hoc, tool, but if the desire is to find a way to make it easier for novices (to Github, to Jekyll) to publish in what is still quite a low level way, I think more scaffolding is required. (A limiting case of easy is probably to just click a button on your Jupyter notebook and have the file posted somewhere, from where it magically then appears on a public URL.)

Inspired by the initial commit handling Github Action, I started some baby steps explorations of a way of making “performative” Github commit actions (action-steps) that might (or might not!) make things simpler for a novice user (they also run the risk of them developing bad mental models, but I’m just exploring ideas).

For example, you might encourage someone via the readme to create a new file from the Github web UI with a particular filename or particular commit message, and then handle that in a particular way, perhaps updating the README with the next step; this might include some description of how you could then compare the original readme with the updated one. (I did start wondering whether I could code Adventure to be played via commit messages! Has that been done before I wonder?)

You might have additional commit messages that introduce new files into the top level repo, a file at a time. (Where to put simple documentation describing commit performative commands would be another issue!)

I appreciate this is probably not how Github is traditionally used, where a principle of least surprise about what appears in the repo compared to the files you actually commit is a sound one (that said, a lot of workflows do make use of commit hooks that do change files…) But I would argue that using Github for the primary purpose of making use of its Github Pages publishing mechanism is not using Github in a traditional version control application way either. Version control is NOT the aim. So what I’m thinking of here is where the user can instruct Git to add in very particular new files at particular times in response to particular commands issued via a particular commit message for a particular reason: to allow them to incrementally develop the complexity of their environment from within the environment as they grow familiar with it. Along the way, the mechanism could coach an introduce the user to features of Github that may be useful in a blogging context, such as the ability to “track changes” and maintain different versions of a content as you draft it etc. This would then introduce them to version control as a side effect of them developing particular blogging workflow practices in an environment that can coach them as they use it.

This may all just be nonsense, of course!

For some definition of “just”…

–tony