Nbdev code coverage / n-Tests

Hi everyone,

I have been shifting my full-time work to use nbdev. I have been using it for developing some of our microservices and tooling. Because I have been using nbdev, I have had the most detailed documentation for my code out of my whole team. It has been awesome and I swear by nbdev now for both work and side projects.

With this said, using nbdev for production development highlighted some areas in testing that seem to be a tad limited. Before nbdev, we used pytest for developing and testing our APIs. One of the things that makes the rest of my team apprehensive in using nbdev is the inability to truely understand “how tested” my code is.

There are 2 features in pytest that would be cool to have in nbdev:

  • output the number of tests that have been run
  • provide an estimate / function for failing and passing based on code coverage

Outputting the number of tests run would be doable. Code coverage I can see being more complex.


To explore a few ideas before suggesting any changes to nbdev, I’m thinking I could write a script to create test_[notebook name].py files from notebooks - like nbdev_build_lib but containing just the test cells.

This would make is possible to use coverage with something like:
coverage run test_00_core.py

If you’d be interested in trying this out on your projects, I’ll try to find time (o:

1 Like

I would still be interested. I will likely be able to do work/coding on my side also possibly after the first 2 weeks of July.

Seems that nbdev doesn’t generate test files themselves. If I remember, it seems to run the notebook and extract the failure outputs(?). It seems theres technically no such thing as a test cell, it’s just a cell that doesn’t get exported to python files. There’s also issues where I use the hide tag for test related code I don’t want exported to html. And also note, that “non test code” cells are still needed to run tests usually.

I’m actually starting to think that doing coverage might not actually be too bad to try to get working. Would coverage allow executing other arbitrary scripts? We could export ALL code in a given notebook to a python script and just run with coverage.

I’ll be pessimistic with my free time prob until mid this month. I like the idea of posting mvp stuff here so if I can I’ll offer code ideas/snippets also after mid july.

If all of your tests run in plain python, this should get you going https://github.com/pete88b/decision_tree/blob/master/80_test_coverage.ipynb

if you’re able to share any of your project code, i’d be really interested to see how other people are using nbdev - and it’d be good for me to test out the new migrate to magic features as well as running a few coverage reports

I’ve also tried pytest-cov by creating test_decision_tree.py

import nbdev.test
def test_run():

then run with

pytest --cov=decision_tree

but that comes back with 0% for decision_tree/core.py

----------- coverage: platform linux, python 3.7.7-final-0 -----------
Name                                        Stmts   Miss Branch BrPart  Cover
decision_tree/__init__.py                       1      0      0      0   100%
decision_tree/_nbdev.py                         6      0      2      0   100%
decision_tree/core.py                          34     34      8      0     0%
decision_tree/data.py                          52     52     14      0     0%
decision_tree/exports_to_target_module.py       4      4      0      0     0%
decision_tree/imports.py                        7      0      0      0   100%

if anyone knows how to make this way work, I’d really appreciate the help (o: maybe its not possible due to pytest-cov limitations?

now i feel silly - that comes back with 0% because it’s not testing decision_tree/core.py (o: it runs 00_core.ipynb from top to bottom.

To get good coverage measures, we need to use the modules that nbdev built and run just the non-exported cells as tests.

If we added a little callback to nbdev.test.test_nb we could easily implement callback handlers to run tests in this way: https://github.com/pete88b/decision_tree/blob/master/test_nbs.py - while making it easy for nbdev to keep current behavior.

Now when I run

pytest --cov=decision_tree

I see

collected 1 item                                                                                                                       
test_nbs.py .                                                                                                                    
----------- coverage: platform linux, python 3.7.7-final-0 -----------
Name                                        Stmts   Miss  Cover
decision_tree/__init__.py                       1      0   100%
decision_tree/_nbdev.py                         6      0   100%
decision_tree/core.py                          34      3    91%
decision_tree/data.py                          52     52     0%

This is very cool, and it almost works for me, so I went spelunking to figure out what was wrong with the last bit. I notice a few things. Sorry this brain dump is kind of raw - I’m trying to put it out there before I knock off for the day.

Two minor ones first:

  1. When you collect the imports from #exported cells, that variable imports is always the empty string, because split_flags_and_code returns a tuple, so line is actually taking on values from that tuple, each of which becomes a list of either tags or lines. So when you’re on the list of lines and you do if 'import' in line, you’re actually checking if the string ‘import’ is exactly equal to the entirety of any of the lines, which it never is. Might not be that big of a deal because anything you would have imported this way comes in when you import every member of the entire module.
  2. Because you’re inserting into position 0 of nb['cells'] as you iterate in order through the exports, you’re actually reversing them in the resulting notebook. Didn’t end up being my problem, but it might cause trouble for other people if the order matters.

Then I found the actual cause of my problem:


I do a bunch of monkey patching in my tests, to mock out stuff each function under test calls, and in this case in the notebook I have an exported version of an object for production set to some variable, and another version of that object that overwrites it (set to the same variable) right afterwards that’s configured differently for test, and it’s not exported. Then, an exported function closes over a method call on this object.


So then when I’m running your script, it imports the production version of the object and the version of the function pointed at that version of the variable. I overwrite that variable with the test version in my non-exported test code, but in this test run only the method is still pointed at the production version of the variable, so when I monkeypatch stuff in the variable as always, it doesn’t affect the method call, and it just goes ahead and runs code I don’t want it to.

One-off fix

I was able to resolve the problem in a one-off way while I was messing around with your code in a notebook by explicitly setting themodule.variable = variable after I had instantiated the test version of variable in my tests.

This suggests to me that we might be able to do something clever like checking whether any left-hand-side of an equals sign in the user’s unexported cells shadows something we import from a module, and then explicitly add a line after that shadowing to propagate that shadowing back into the module as well… That seems like it would work, but maybe there’s an easier way I’m not thinking of right now.

good catch - I’ve been meaning to re-write and simplify for a while - so I’ll include these 2 fixes

Do you expose things like themodule.variable just to make it possible to monkey patch in test code? or might you change things like this in prod code too?

What I’m thinking is,

  • do we need this script to understand shadowing (like you suggest)
  • do we need a way to replace something (variable, function def etc) in a module at test time (that also works when we run the notebook) or
  • might is make sense to re-write the module to make it more testable?

Sorry to hijack this thread–I have a related question…

I’m new to nbdev and am trying to understand the best practices around testing. I agree :100: that coverage is an important element. It sounds like folks in this thread have made some progress on that…

I’m used to using pytest and writing fixtures / parameterizations. Has anyone figured out how to use @pytest.mark.parametrize or @pytest.fixture decorators in an nbdev notebook?

When you create an object in a notebook cell, it is usable in any following notebook cell. Isn’t that covering the use case of at least the module-based fixtures?

1 Like

Yes @michaelaye, I see what you mean, and thanks for the suggestion. For sure that allows me to reuse objects as if they were fixtures. However, this is not nearly as powerful and flexible as pytest’s fixture machinery. The feature I am missing the most is parameterization of tests and fixtures.

@rabernat I think it would be helpful if you asked about a concrete example “Whats the best way to test this in nbdev?”, rather than the abstract “flexible and powerful” because it is hard to help / comment on something like that!

I also think the point of nbdev is that your tests are part of your docs, so you don’t want to use fixtures if possible which tend not to be as readable and rather demonstrate real, minimal use cases you can test against. But again, its easier to discuss a concrete example than to talk about an abstract notion of something being better or worse just so we don’t waste time

Hi, there. I dug a little bit into pytest and tried to make a plugin which can let you run pytest testcases in Jupyter Notebook.

1 Like

Have you continued on this? I am interested on helping out. We really need coverage on nbde created modules.

1 Like

Currently I use the exporti tag to move files to a unit test.

so something like
– Cell
# default_exp core
– Cell

#exporti tests.test_core

– Cell

One problem is that the exporti tag seems to work by appending, so when using the build_lib, changes to the unit tests file gets append to the end of the unit_test.py file. You get something like this after calling nbdev_build_lib

-- tests.test_core.py
class TestCore:
    def test_one():

class TestCore:
    def test_one():
         <some new line>

To fix this I just run a rm cmd before nbdev_build_lib
rm library/tests/*.py

Finally I have all the steps in a bash script…
rm library/tests/*.py

After this I can use pytest to easily check code coverage

Was wondering if it would be a good idea to have something like a nbdev_build_tests.The idea is to extract tests onto a seperate unit testing module, so that nbdev written code can play nice with unit testing suites.

nbdev has its own unit testing system where you write your tests in-situ with the source code, not in separate unit tests. You probably already know this, however just want to point it out.

You can certainly extend the library on your end to do whatever you wish. I am not sure we are keen to support this kind of separate testing system at our level just yet (that’s just a guess though based upon my experience).

Makes sense, its simple enough to do if required (thank god the lbrary is so hackable). Might be useful for folks but maybe niche enough that its not worth spinning out a standalone library.
Main reason on my end is for integration with some at work CI tools that depend on pytest

While nbdev certainly doesn’t need another unit testing system, coverage display could still be a useful feature, right?
I’m currently playing with applying the nbval plugin to the pytest call like so (planetarypy is a package created by nbdev):

pytest --cov=planetarypy --current-env --nbval notebooks/

and it seems to produce something useful to the first glance:

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name                           Stmts   Miss  Cover
planetarypy/__init__.py            1      0   100%
planetarypy/_nbdev.py              6      6     0%
planetarypy/config.py             55      9    84%
planetarypy/ctx.py                75     75     0%
planetarypy/data/__init__.py       0      0   100%
planetarypy/pds/__init__.py        0      0   100%
planetarypy/pds/apps.py           13      3    77%
planetarypy/pds/ctx_index.py      41      0   100%
planetarypy/pds/indexes.py       255    132    48%
planetarypy/utils.py              93     44    53%
TOTAL                            539    269    50%

================================================================= short test summary info ==================================================================
FAILED notebooks/00_config.ipynb::Cell 7
FAILED notebooks/00_config.ipynb::Cell 10
FAILED notebooks/00_config.ipynb::Cell 13
FAILED notebooks/00_config.ipynb::Cell 14
FAILED notebooks/00_config.ipynb::Cell 16
FAILED notebooks/02a_pds.indexes.ipynb::Cell 22
FAILED notebooks/02a_pds.indexes.ipynb::Cell 25
FAILED notebooks/02a_pds.indexes.ipynb::Cell 26
FAILED notebooks/02a_pds.indexes.ipynb::Cell 27
FAILED notebooks/02a_pds.indexes.ipynb::Cell 28
FAILED notebooks/02a_pds.indexes.ipynb::Cell 29
FAILED notebooks/02a_pds.indexes.ipynb::Cell 31
FAILED notebooks/02b_pds.ctx_index.ipynb::Cell 5
FAILED notebooks/02b_pds.ctx_index.ipynb::Cell 14
FAILED notebooks/03_ctx.ipynb::Cell 7
FAILED notebooks/03_ctx.ipynb::Cell 16
FAILED notebooks/03_ctx.ipynb::Cell 17
FAILED notebooks/03_ctx.ipynb::Cell 19
FAILED notebooks/03_ctx.ipynb::Cell 21
FAILED notebooks/03_ctx.ipynb::Cell 22
FAILED notebooks/apps_demo.ipynb::Cell 1
FAILED notebooks/apps_demo.ipynb::Cell 3
FAILED notebooks/apps_demo.ipynb::Cell 4
FAILED notebooks/apps_demo.ipynb::Cell 8
FAILED notebooks/apps_demo.ipynb::Cell 10
================================================== 25 failed, 119 passed, 4 xfailed in 259.62s (0:04:19) ===================================================

Maybe we could have a look at their code and borrow/adapt if at all necessary?

nbval’s main idea is to write tests for an installed package exclusively in notebooks.

I’ve found pete’s solution to be quite efficient enough for me, and I’ve been using that for most of my packages
Specifically here: Nbdev code coverage / n-Tests - #6 by pete88b

I see it more as a cool add-on rather than a feature I’d want explicitly in nbdev. As even if you are doing code coverage, meaningful code coverage vs coverage are 2 different things, and IMO this is just a way for you to make sure you’re aligning your tests you’re writing as you go

Ex I brought this into adaptnlp maybe 3 months ago, around the release of lib2nbdev. I’ve checked code coverage maybe 3 times? Just to make sure I’m on the right pace/checking what I need to

In my head I see a future of folks adding extensions to raw nbdev, that we can pip install along with it (I’m working on my own rn for instance)

I was a bit scared by Pete’s script extracting everything into new files, I shall give it a try and see how it compares to nbval’s numbers.

I agree that coverage isn’t something the dev should rely on, however I feel like users are deciding sometimes to make use of a tool based on existing test coverage, for whatever that’s worth.