Nbdev code coverage / n-Tests

Hi everyone,

I have been shifting my full-time work to use nbdev. I have been using it for developing some of our microservices and tooling. Because I have been using nbdev, I have had the most detailed documentation for my code out of my whole team. It has been awesome and I swear by nbdev now for both work and side projects.

With this said, using nbdev for production development highlighted some areas in testing that seem to be a tad limited. Before nbdev, we used pytest for developing and testing our APIs. One of the things that makes the rest of my team apprehensive in using nbdev is the inability to truely understand “how tested” my code is.

There are 2 features in pytest that would be cool to have in nbdev:

  • output the number of tests that have been run
  • provide an estimate / function for failing and passing based on code coverage

Outputting the number of tests run would be doable. Code coverage I can see being more complex.

1 Like

To explore a few ideas before suggesting any changes to nbdev, I’m thinking I could write a script to create test_[notebook name].py files from notebooks - like nbdev_build_lib but containing just the test cells.

This would make is possible to use coverage with something like:
coverage run test_00_core.py

If you’d be interested in trying this out on your projects, I’ll try to find time (o:

1 Like

I would still be interested. I will likely be able to do work/coding on my side also possibly after the first 2 weeks of July.

Seems that nbdev doesn’t generate test files themselves. If I remember, it seems to run the notebook and extract the failure outputs(?). It seems theres technically no such thing as a test cell, it’s just a cell that doesn’t get exported to python files. There’s also issues where I use the hide tag for test related code I don’t want exported to html. And also note, that “non test code” cells are still needed to run tests usually.

I’m actually starting to think that doing coverage might not actually be too bad to try to get working. Would coverage allow executing other arbitrary scripts? We could export ALL code in a given notebook to a python script and just run with coverage.

I’ll be pessimistic with my free time prob until mid this month. I like the idea of posting mvp stuff here so if I can I’ll offer code ideas/snippets also after mid july.

If all of your tests run in plain python, this should get you going https://github.com/pete88b/decision_tree/blob/master/80_test_coverage.ipynb

if you’re able to share any of your project code, i’d be really interested to see how other people are using nbdev - and it’d be good for me to test out the new migrate to magic features as well as running a few coverage reports

I’ve also tried pytest-cov by creating test_decision_tree.py

import nbdev.test
def test_run():

then run with

pytest --cov=decision_tree

but that comes back with 0% for decision_tree/core.py

----------- coverage: platform linux, python 3.7.7-final-0 -----------
Name                                        Stmts   Miss Branch BrPart  Cover
decision_tree/__init__.py                       1      0      0      0   100%
decision_tree/_nbdev.py                         6      0      2      0   100%
decision_tree/core.py                          34     34      8      0     0%
decision_tree/data.py                          52     52     14      0     0%
decision_tree/exports_to_target_module.py       4      4      0      0     0%
decision_tree/imports.py                        7      0      0      0   100%

if anyone knows how to make this way work, I’d really appreciate the help (o: maybe its not possible due to pytest-cov limitations?

now i feel silly - that comes back with 0% because it’s not testing decision_tree/core.py (o: it runs 00_core.ipynb from top to bottom.

To get good coverage measures, we need to use the modules that nbdev built and run just the non-exported cells as tests.

If we added a little callback to nbdev.test.test_nb we could easily implement callback handlers to run tests in this way: https://github.com/pete88b/decision_tree/blob/master/test_nbs.py - while making it easy for nbdev to keep current behavior.

Now when I run

pytest --cov=decision_tree

I see

collected 1 item                                                                                                                       
test_nbs.py .                                                                                                                    
----------- coverage: platform linux, python 3.7.7-final-0 -----------
Name                                        Stmts   Miss  Cover
decision_tree/__init__.py                       1      0   100%
decision_tree/_nbdev.py                         6      0   100%
decision_tree/core.py                          34      3    91%
decision_tree/data.py                          52     52     0%
1 Like

This is very cool, and it almost works for me, so I went spelunking to figure out what was wrong with the last bit. I notice a few things. Sorry this brain dump is kind of raw - I’m trying to put it out there before I knock off for the day.

Two minor ones first:

  1. When you collect the imports from #exported cells, that variable imports is always the empty string, because split_flags_and_code returns a tuple, so line is actually taking on values from that tuple, each of which becomes a list of either tags or lines. So when you’re on the list of lines and you do if 'import' in line, you’re actually checking if the string ‘import’ is exactly equal to the entirety of any of the lines, which it never is. Might not be that big of a deal because anything you would have imported this way comes in when you import every member of the entire module.
  2. Because you’re inserting into position 0 of nb['cells'] as you iterate in order through the exports, you’re actually reversing them in the resulting notebook. Didn’t end up being my problem, but it might cause trouble for other people if the order matters.

Then I found the actual cause of my problem:


I do a bunch of monkey patching in my tests, to mock out stuff each function under test calls, and in this case in the notebook I have an exported version of an object for production set to some variable, and another version of that object that overwrites it (set to the same variable) right afterwards that’s configured differently for test, and it’s not exported. Then, an exported function closes over a method call on this object.


So then when I’m running your script, it imports the production version of the object and the version of the function pointed at that version of the variable. I overwrite that variable with the test version in my non-exported test code, but in this test run only the method is still pointed at the production version of the variable, so when I monkeypatch stuff in the variable as always, it doesn’t affect the method call, and it just goes ahead and runs code I don’t want it to.

One-off fix

I was able to resolve the problem in a one-off way while I was messing around with your code in a notebook by explicitly setting themodule.variable = variable after I had instantiated the test version of variable in my tests.

This suggests to me that we might be able to do something clever like checking whether any left-hand-side of an equals sign in the user’s unexported cells shadows something we import from a module, and then explicitly add a line after that shadowing to propagate that shadowing back into the module as well… That seems like it would work, but maybe there’s an easier way I’m not thinking of right now.

good catch - I’ve been meaning to re-write and simplify for a while - so I’ll include these 2 fixes

Do you expose things like themodule.variable just to make it possible to monkey patch in test code? or might you change things like this in prod code too?

What I’m thinking is,

  • do we need this script to understand shadowing (like you suggest)
  • do we need a way to replace something (variable, function def etc) in a module at test time (that also works when we run the notebook) or
  • might is make sense to re-write the module to make it more testable?

Sorry to hijack this thread–I have a related question…

I’m new to nbdev and am trying to understand the best practices around testing. I agree :100: that coverage is an important element. It sounds like folks in this thread have made some progress on that…

I’m used to using pytest and writing fixtures / parameterizations. Has anyone figured out how to use @pytest.mark.parametrize or @pytest.fixture decorators in an nbdev notebook?

When you create an object in a notebook cell, it is usable in any following notebook cell. Isn’t that covering the use case of at least the module-based fixtures?

1 Like

Yes @michaelaye, I see what you mean, and thanks for the suggestion. For sure that allows me to reuse objects as if they were fixtures. However, this is not nearly as powerful and flexible as pytest’s fixture machinery. The feature I am missing the most is parameterization of tests and fixtures.

@rabernat I think it would be helpful if you asked about a concrete example “Whats the best way to test this in nbdev?”, rather than the abstract “flexible and powerful” because it is hard to help / comment on something like that!

I also think the point of nbdev is that your tests are part of your docs, so you don’t want to use fixtures if possible which tend not to be as readable and rather demonstrate real, minimal use cases you can test against. But again, its easier to discuss a concrete example than to talk about an abstract notion of something being better or worse just so we don’t waste time