I have been shifting my full-time work to use nbdev. I have been using it for developing some of our microservices and tooling. Because I have been using nbdev, I have had the most detailed documentation for my code out of my whole team. It has been awesome and I swear by nbdev now for both work and side projects.
With this said, using nbdev for production development highlighted some areas in testing that seem to be a tad limited. Before nbdev, we used pytest for developing and testing our APIs. One of the things that makes the rest of my team apprehensive in using nbdev is the inability to truely understand āhow testedā my code is.
There are 2 features in pytest that would be cool to have in nbdev:
output the number of tests that have been run
provide an estimate / function for failing and passing based on code coverage
Outputting the number of tests run would be doable. Code coverage I can see being more complex.
To explore a few ideas before suggesting any changes to nbdev, Iām thinking I could write a script to create test_[notebook name].py files from notebooks - like nbdev_build_lib but containing just the test cells.
This would make is possible to use coverage with something like: coverage run test_00_core.py
If youād be interested in trying this out on your projects, Iāll try to find time (o:
I would still be interested. I will likely be able to do work/coding on my side also possibly after the first 2 weeks of July.
Seems that nbdev doesnāt generate test files themselves. If I remember, it seems to run the notebook and extract the failure outputs(?). It seems theres technically no such thing as a test cell, itās just a cell that doesnāt get exported to python files. Thereās also issues where I use the hide tag for test related code I donāt want exported to html. And also note, that ānon test codeā cells are still needed to run tests usually.
Iām actually starting to think that doing coverage might not actually be too bad to try to get working. Would coverage allow executing other arbitrary scripts? We could export ALL code in a given notebook to a python script and just run with coverage.
Iāll be pessimistic with my free time prob until mid this month. I like the idea of posting mvp stuff here so if I can Iāll offer code ideas/snippets also after mid july.
if youāre able to share any of your project code, iād be really interested to see how other people are using nbdev - and itād be good for me to test out the new migrate to magic features as well as running a few coverage reports
This is very cool, and it almost works for me, so I went spelunking to figure out what was wrong with the last bit. I notice a few things. Sorry this brain dump is kind of raw - Iām trying to put it out there before I knock off for the day.
Two minor ones first:
When you collect the imports from #exported cells, that variable imports is always the empty string, because split_flags_and_code returns a tuple, so line is actually taking on values from that tuple, each of which becomes a list of either tags or lines. So when youāre on the list of lines and you do if 'import' in line, youāre actually checking if the string āimportā is exactly equal to the entirety of any of the lines, which it never is. Might not be that big of a deal because anything you would have imported this way comes in when you import every member of the entire module.
Because youāre inserting into position 0 of nb['cells'] as you iterate in order through the exports, youāre actually reversing them in the resulting notebook. Didnāt end up being my problem, but it might cause trouble for other people if the order matters.
Then I found the actual cause of my problem:
Setup:
I do a bunch of monkey patching in my tests, to mock out stuff each function under test calls, and in this case in the notebook I have an exported version of an object for production set to some variable, and another version of that object that overwrites it (set to the same variable) right afterwards thatās configured differently for test, and itās not exported. Then, an exported function closes over a method call on this object.
Test
So then when Iām running your script, it imports the production version of the object and the version of the function pointed at that version of the variable. I overwrite that variable with the test version in my non-exported test code, but in this test run only the method is still pointed at the production version of the variable, so when I monkeypatch stuff in the variable as always, it doesnāt affect the method call, and it just goes ahead and runs code I donāt want it to.
One-off fix
I was able to resolve the problem in a one-off way while I was messing around with your code in a notebook by explicitly setting themodule.variable = variable after I had instantiated the test version of variable in my tests.
This suggests to me that we might be able to do something clever like checking whether any left-hand-side of an equals sign in the userās unexported cells shadows something we import from a module, and then explicitly add a line after that shadowing to propagate that shadowing back into the module as well⦠That seems like it would work, but maybe thereās an easier way Iām not thinking of right now.
good catch - Iāve been meaning to re-write and simplify for a while - so Iāll include these 2 fixes
Do you expose things like themodule.variable just to make it possible to monkey patch in test code? or might you change things like this in prod code too?
What Iām thinking is,
do we need this script to understand shadowing (like you suggest)
do we need a way to replace something (variable, function def etc) in a module at test time (that also works when we run the notebook) or
might is make sense to re-write the module to make it more testable?
Sorry to hijack this threadāI have a related questionā¦
Iām new to nbdev and am trying to understand the best practices around testing. I agree that coverage is an important element. It sounds like folks in this thread have made some progress on thatā¦
Iām used to using pytest and writing fixtures / parameterizations. Has anyone figured out how to use @pytest.mark.parametrize or @pytest.fixture decorators in an nbdev notebook?
When you create an object in a notebook cell, it is usable in any following notebook cell. Isnāt that covering the use case of at least the module-based fixtures?
Yes @michaelaye, I see what you mean, and thanks for the suggestion. For sure that allows me to reuse objects as if they were fixtures. However, this is not nearly as powerful and flexible as pytestās fixture machinery. The feature I am missing the most is parameterization of tests and fixtures.
@rabernat I think it would be helpful if you asked about a concrete example āWhats the best way to test this in nbdev?ā, rather than the abstract āflexible and powerfulā because it is hard to help / comment on something like that!
I also think the point of nbdev is that your tests are part of your docs, so you donāt want to use fixtures if possible which tend not to be as readable and rather demonstrate real, minimal use cases you can test against. But again, its easier to discuss a concrete example than to talk about an abstract notion of something being better or worse just so we donāt waste time
Currently I use the exporti tag to move files to a unit test.
so something like
ā Cell
# default_exp core
ā Cell #export #exporti tests.test_core
ā Cell
One problem is that the exporti tag seems to work by appending, so when using the build_lib, changes to the unit tests file gets append to the end of the unit_test.py file. You get something like this after calling nbdev_build_lib
-- tests.test_core.py
class TestCore:
def test_one():
pass
class TestCore:
def test_one():
<some new line>
pass
To fix this I just run a rm cmd before nbdev_build_lib
rm library/tests/*.py
Finally I have all the steps in a bash scriptā¦
rm library/tests/*.py
nbdev_build_lib
nbdev_build_docs
After this I can use pytest to easily check code coverage
Was wondering if it would be a good idea to have something like a nbdev_build_tests.The idea is to extract tests onto a seperate unit testing module, so that nbdev written code can play nice with unit testing suites.
nbdev has its own unit testing system where you write your tests in-situ with the source code, not in separate unit tests. You probably already know this, however just want to point it out.
You can certainly extend the library on your end to do whatever you wish. I am not sure we are keen to support this kind of separate testing system at our level just yet (thatās just a guess though based upon my experience).
Makes sense, its simple enough to do if required (thank god the lbrary is so hackable). Might be useful for folks but maybe niche enough that its not worth spinning out a standalone library.
Main reason on my end is for integration with some at work CI tools that depend on pytest
While nbdev certainly doesnāt need another unit testing system, coverage display could still be a useful feature, right?
Iām currently playing with applying the nbval plugin to the pytest call like so (planetarypy is a package created by nbdev):
Iāve found peteās solution to be quite efficient enough for me, and Iāve been using that for most of my packages
Specifically here: Nbdev code coverage / n-Tests - #6 by pete88b
I see it more as a cool add-on rather than a feature Iād want explicitly in nbdev. As even if you are doing code coverage, meaningful code coverage vs coverage are 2 different things, and IMO this is just a way for you to make sure youāre aligning your tests youāre writing as you go
Ex I brought this into adaptnlp maybe 3 months ago, around the release of lib2nbdev. Iāve checked code coverage maybe 3 times? Just to make sure Iām on the right pace/checking what I need to
In my head I see a future of folks adding extensions to raw nbdev, that we can pip install along with it (Iām working on my own rn for instance)
I was a bit scared by Peteās script extracting everything into new files, I shall give it a try and see how it compares to nbvalās numbers.
I agree that coverage isnāt something the dev should rely on, however I feel like users are deciding sometimes to make use of a tool based on existing test coverage, for whatever thatās worth.