NBDEV for course content: questions around dealing with time-consuming tests and cleaning notebooks

johnowhitaker · September 13, 2022, 5:59am

I’m using nbdev to share a course I’m creating, and I have one or two questions I’d like to pick the community’s brain on:

A number of lessons require downloading large-ish datasets or models and working with them throughout the lesson, including some long-running training cells. This is a bit of a pain when it comes to CI since testing by running all cells is going to take a while. Any suggestions for dealing with this? I don’t want to just skip testing. Maybe detect when we’re in CI and use small dummy datasets during testing? I’d also like to minimize the extra code learners will have to scroll past for this, so hundreds of if testing: n_iter = 5 statements wouldn’t be ideal. Would appreciate any ideas if you’ve dealt with something like this.
‘Open in colab’ would be a great option, but just opening the source notebook means lots of nbdev/quarto directives and so on (along with possibly all the extra code required for speeding up testing referenced above). I’d love to create some sort of pipeline that can pre-process the notebooks to:
- remove directives
- add cells to install requirements
- strip out some other specific cells that are maybe focused on testing/CI/nbdev stuff
  Any suggestions for how to start on this would be great. @hamelsmu I know we chatted about this briefly when we last met - I wonder if you’ve had any brain waves about the best way to start implementing something like this?

tylere · September 13, 2022, 6:58pm

I also am interested these items, particularly #2, for teaching workshops where the only requirement would be to show up with a laptop and web browser.

remove directives

This may be possible by using quarto to render a Jupyter notebook

quarto render --to ipynb

add cells to install requirements

strip out some other specific cells that are maybe focused on testing/CI/nbdev stuff

Yes, it would be useful if there was a cell directive that could be used to indicate whether it should be included (or stripped) in the output for a specific format.

hamelsmu · September 14, 2022, 4:13am

I’m tracking this on this issue. I’ll try to look into this sometime soonish, @deven367 may already be looking into it

johnowhitaker · September 14, 2022, 5:05am

Great - will move future discussion to that issue. Thanks