Nbdev v2 codebase falls short of literate programming

I enjoyed perusing and learning from the Nbdev v1 source code notebooks. They told a story and explained things as they went, truly written as if “addressed to human beings rather than to a computer.”.

Unfortunately, the same cannot be said of the Nbdev v2 source notebooks. (Don’t take my word for it, though. In the video tutorial, Jeremy acknowledges that the nbdev v2 source code is not as good an example of its animating principles as one might hope.) They are no longer even numbered to help orient the reader as to where to look first.

I recently went to the source to try to understand the cause of an error I was running into. I had traced it to the creation of the README.md file, so I wanted to find how quarto was called to create that file. I found my way to the quarto.nbdev_readme function. With some difficulty, I determined to look next at serve.proc_nbs. At that point, I gave up, finding the code impenetrable.

I would like to try to help improve this situation, but it seems too difficult (for this intermediate programmer, anyway) to understand the current code well enough to improve its documentation. I wish I had something more constructive than this complaint to offer, but maybe provoking a discussion is the necessary first step.

I write this all as an admirer and someone with great gratitude and hope for Nbdev. Well, admittedly, my hope is waning–I’m starting to doubt this can be a viable project going forward if the source code persists in this state for long. But that’s something we can work together to fix.

1 Like

@hugetim I can’t comment on nbdev v1 or the sizzly comparison as I haven’t dug into v1 before. But this does encourage me to check out the v1 notebooks next time I’m trying to understand nbdev’s approach a bit better.

I had a recent dive into nbdev_readme and then serve.proc_nbs that seems to parallel your exploration a bit:
Nbdev_readme vs. nbdev_test - nbdev - fast.ai Course Forums

(It looks like you’re doing something with Scala, though, so I bet there’s some differences as well.)

I’d be curious if my issue is similar to yours or if there’s any specific questions about nbdev_proc that I might also have wondered.

I understand this topic is a little like the jumping off point for what you’re trying to say here, so almost could be a separate thread… but chiming in anyway.

1 Like

There’s always always a tradeoff between compactness and explanatory detail. It could be that the second time through, the lead devs swung more toward compactness – but from your telling, maybe there’s some intention or desire to flesh out the explanations and new users such as us could help in this.

1 Like

I dont’ like the fact that the files aren’t numbered either. I feel that greatly encumbers the readability of the files when looking at the notebooks. Just to give you more background, the numbers used to order the files on the sidenav in the rendered docs, however, the order is now handled in the order field in the front matter of each notebook (Quarto stopped supporting file-name based ordering).

I’ve reached out to the Quarto folks to get their thoughts on if they can bring this feature back! In the meantime, I’ve opened this PR that manually adds the prefix to the api files to correspond with the order field. This is not an ideal long-term solution, but at least it will allow you to quickly sort the files and read them in the interim.

4 Likes

Thank you.

As I skim through the notebooks in order now, I see that most of them actually are well-explained, with tests that help the understanding. Maybe it’s just the notebooks I was looking at to try to understand my error (‘quarto’ and ‘serve’), as well as ‘qmd’, that are lacking sufficient explanation. I think I should apologize for jumping to a general conclusion based on that small sample size.

1 Like

Those are very terse because I thought that they are pretty tedious and basic, with little worthy of further discussion. However the fact that you found the code impenetrable shows that my thinking was wrong. I’m happy to add some prose to describe what’s going on. It would help a lot if you could describe a bit of what bits you found impenetrable, or any context around what kind of information might help you understand that code.

3 Likes

That’s generous of you. It may be that my expectations were off and I just needed to be more patient as I got my bearings. But for what it’s worth, here’s what the experience of trying to track down the cause of this error was like for me:

  1. It took me a bit to see that the first chunk of code was just getting paths and checking if the readme needed updating. (Also, my initial impression was that the chk_time parameter provided the answer to the question of whether the readme is out of date, but now I understand that it merely toggles whether to check if the readme is out of date.)
  2. I wondered whether my problem was that I just hadn’t built up sufficient background familiarity with fastcore and the other parts of the nbdev code. The three wildcard imports made the prospect of tracking down unfamiliar functions seem daunting. (By now, I see that none of that plays much role in nbdev_readme, aside from the call_parse decorator, which is straightforward enough. I also now realize I could have used VSCode to navigate to any unfamiliar functions using the .py files.)
  3. I eventually found the proc_nbs function and guessed that was probably where the action was (though I now think it’s more in the following line that subprocess-runs quarto). I went to look at it in the serve module but couldn’t (within a few minutes) understand much of what was going on there.
  4. I can now see that nbdev_readme is mostly boilerplate with a couple lines that actually call quarto. But I still don’t really understand where to go next to better understand precisely what quarto does (in terms of the source of my unicode error).

I refactored nbdev_readme a bit in a pull request, adding a bit of documentation that would have been helpful to me as well. It took me a couple hours to do that, though, which may not be worth the time–except that with this being the flagship nbdev project, it does seem worth taking the time to make it as clear and approachable as we can.

Yes this is the critical insight. When you’re trying to understand enough of a new codebase to make a specific PR, it’s best to use an editor which lets you jump to definitions to understand how the code works. Reading and experimenting in the notebooks is the best way to understand the bigger picture more thoroughly.

Note that in Jupyter wildcard imports should not be a problem - just type the name of a symbol in a cell and hit ctrl-enter to find out where the symbol comes from. Add a ? to get brief docs, and ?? to get its source code. Type doc(symbol) to get a link to its online documentation (if it’s a public symbol).

1 Like

Yeah, at first I was only looking at the notebooks on github rather than within a jupyter notebook. :man_facepalming: But I’m also not in the habit of looking things up in notebooks that way–that’s going to be really helpful. Thank you.

1 Like