I have been shifting my full-time work to use nbdev. I have been using it for developing some of our microservices and tooling. Because I have been using nbdev, I have had the most detailed documentation for my code out of my whole team. It has been awesome and I swear by nbdev now for both work and side projects.
With this said, using nbdev for production development highlighted some areas in testing that seem to be a tad limited. Before nbdev, we used pytest for developing and testing our APIs. One of the things that makes the rest of my team apprehensive in using nbdev is the inability to truely understand “how tested” my code is.
There are 2 features in pytest that would be cool to have in nbdev:
output the number of tests that have been run
provide an estimate / function for failing and passing based on code coverage
Outputting the number of tests run would be doable. Code coverage I can see being more complex.
To explore a few ideas before suggesting any changes to nbdev, I’m thinking I could write a script to create test_[notebook name].py files from notebooks - like nbdev_build_lib but containing just the test cells.
This would make is possible to use coverage with something like: coverage run test_00_core.py
If you’d be interested in trying this out on your projects, I’ll try to find time (o:
I would still be interested. I will likely be able to do work/coding on my side also possibly after the first 2 weeks of July.
Seems that nbdev doesn’t generate test files themselves. If I remember, it seems to run the notebook and extract the failure outputs(?). It seems theres technically no such thing as a test cell, it’s just a cell that doesn’t get exported to python files. There’s also issues where I use the hide tag for test related code I don’t want exported to html. And also note, that “non test code” cells are still needed to run tests usually.
I’m actually starting to think that doing coverage might not actually be too bad to try to get working. Would coverage allow executing other arbitrary scripts? We could export ALL code in a given notebook to a python script and just run with coverage.
I’ll be pessimistic with my free time prob until mid this month. I like the idea of posting mvp stuff here so if I can I’ll offer code ideas/snippets also after mid july.
if you’re able to share any of your project code, i’d be really interested to see how other people are using nbdev - and it’d be good for me to test out the new migrate to magic features as well as running a few coverage reports
This is very cool, and it almost works for me, so I went spelunking to figure out what was wrong with the last bit. I notice a few things. Sorry this brain dump is kind of raw - I’m trying to put it out there before I knock off for the day.
Two minor ones first:
When you collect the imports from #exported cells, that variable imports is always the empty string, because split_flags_and_code returns a tuple, so line is actually taking on values from that tuple, each of which becomes a list of either tags or lines. So when you’re on the list of lines and you do if 'import' in line, you’re actually checking if the string ‘import’ is exactly equal to the entirety of any of the lines, which it never is. Might not be that big of a deal because anything you would have imported this way comes in when you import every member of the entire module.
Because you’re inserting into position 0 of nb['cells'] as you iterate in order through the exports, you’re actually reversing them in the resulting notebook. Didn’t end up being my problem, but it might cause trouble for other people if the order matters.
Then I found the actual cause of my problem:
Setup:
I do a bunch of monkey patching in my tests, to mock out stuff each function under test calls, and in this case in the notebook I have an exported version of an object for production set to some variable, and another version of that object that overwrites it (set to the same variable) right afterwards that’s configured differently for test, and it’s not exported. Then, an exported function closes over a method call on this object.
Test
So then when I’m running your script, it imports the production version of the object and the version of the function pointed at that version of the variable. I overwrite that variable with the test version in my non-exported test code, but in this test run only the method is still pointed at the production version of the variable, so when I monkeypatch stuff in the variable as always, it doesn’t affect the method call, and it just goes ahead and runs code I don’t want it to.
One-off fix
I was able to resolve the problem in a one-off way while I was messing around with your code in a notebook by explicitly setting themodule.variable = variable after I had instantiated the test version of variable in my tests.
This suggests to me that we might be able to do something clever like checking whether any left-hand-side of an equals sign in the user’s unexported cells shadows something we import from a module, and then explicitly add a line after that shadowing to propagate that shadowing back into the module as well… That seems like it would work, but maybe there’s an easier way I’m not thinking of right now.
good catch - I’ve been meaning to re-write and simplify for a while - so I’ll include these 2 fixes
Do you expose things like themodule.variable just to make it possible to monkey patch in test code? or might you change things like this in prod code too?
What I’m thinking is,
do we need this script to understand shadowing (like you suggest)
do we need a way to replace something (variable, function def etc) in a module at test time (that also works when we run the notebook) or
might is make sense to re-write the module to make it more testable?
Sorry to hijack this thread–I have a related question…
I’m new to nbdev and am trying to understand the best practices around testing. I agree that coverage is an important element. It sounds like folks in this thread have made some progress on that…
I’m used to using pytest and writing fixtures / parameterizations. Has anyone figured out how to use @pytest.mark.parametrize or @pytest.fixture decorators in an nbdev notebook?
When you create an object in a notebook cell, it is usable in any following notebook cell. Isn’t that covering the use case of at least the module-based fixtures?
Yes @michaelaye, I see what you mean, and thanks for the suggestion. For sure that allows me to reuse objects as if they were fixtures. However, this is not nearly as powerful and flexible as pytest’s fixture machinery. The feature I am missing the most is parameterization of tests and fixtures.
@rabernat I think it would be helpful if you asked about a concrete example “Whats the best way to test this in nbdev?”, rather than the abstract “flexible and powerful” because it is hard to help / comment on something like that!
I also think the point of nbdev is that your tests are part of your docs, so you don’t want to use fixtures if possible which tend not to be as readable and rather demonstrate real, minimal use cases you can test against. But again, its easier to discuss a concrete example than to talk about an abstract notion of something being better or worse just so we don’t waste time
Currently I use the exporti tag to move files to a unit test.
so something like
– Cell
# default_exp core
– Cell #export #exporti tests.test_core
– Cell
One problem is that the exporti tag seems to work by appending, so when using the build_lib, changes to the unit tests file gets append to the end of the unit_test.py file. You get something like this after calling nbdev_build_lib
-- tests.test_core.py
class TestCore:
def test_one():
pass
class TestCore:
def test_one():
<some new line>
pass
To fix this I just run a rm cmd before nbdev_build_lib
rm library/tests/*.py
Finally I have all the steps in a bash script…
rm library/tests/*.py
nbdev_build_lib
nbdev_build_docs
After this I can use pytest to easily check code coverage
Was wondering if it would be a good idea to have something like a nbdev_build_tests.The idea is to extract tests onto a seperate unit testing module, so that nbdev written code can play nice with unit testing suites.
nbdev has its own unit testing system where you write your tests in-situ with the source code, not in separate unit tests. You probably already know this, however just want to point it out.
You can certainly extend the library on your end to do whatever you wish. I am not sure we are keen to support this kind of separate testing system at our level just yet (that’s just a guess though based upon my experience).
Makes sense, its simple enough to do if required (thank god the lbrary is so hackable). Might be useful for folks but maybe niche enough that its not worth spinning out a standalone library.
Main reason on my end is for integration with some at work CI tools that depend on pytest
While nbdev certainly doesn’t need another unit testing system, coverage display could still be a useful feature, right?
I’m currently playing with applying the nbval plugin to the pytest call like so (planetarypy is a package created by nbdev):
I see it more as a cool add-on rather than a feature I’d want explicitly in nbdev. As even if you are doing code coverage, meaningful code coverage vs coverage are 2 different things, and IMO this is just a way for you to make sure you’re aligning your tests you’re writing as you go
Ex I brought this into adaptnlp maybe 3 months ago, around the release of lib2nbdev. I’ve checked code coverage maybe 3 times? Just to make sure I’m on the right pace/checking what I need to
In my head I see a future of folks adding extensions to raw nbdev, that we can pip install along with it (I’m working on my own rn for instance)
I was a bit scared by Pete’s script extracting everything into new files, I shall give it a try and see how it compares to nbval’s numbers.
I agree that coverage isn’t something the dev should rely on, however I feel like users are deciding sometimes to make use of a tool based on existing test coverage, for whatever that’s worth.