Test Registry project

Benudek · February 8, 2019, 5:04am

OK, I started working on the function this_test to register tests with what they test in a json file. Will take me a moment

Then would look at doc_test (unless someone else already picked it - what ever makes sense), thx a lot @ashaw!

Benudek · February 10, 2019, 9:08am

@stas submitted a pseudo PR (because it shouldnt get merged) to discuss this_tests design here: https://github.com/fastai/fastai/pull/1604.

Seems easier discussing code on github rather than here.

If you monitor git anyways, also closed PRs, and additional notify here to noisy, let me know

Also looping @ashaw

stas · February 10, 2019, 4:13pm

I think we can discuss it here just fine, perhaps another wiki post where any person can pose a question and another answer.

The easiest would have been to work together on the same branch, but I’m not sure how to arrange that since we don’t all have the same permissions.

To your initial prototype and questions you asked in the PR:

there should be no setup/teardown in each test module - this should happen only once at the level of pytest - so need to figure out the pytest starting/finishing hooks/callbacks. The only thing that goes into test modules is a single test_this() call per test. It’s possible that we will need to write a small pytest extension to make it happen. But perhaps there is a direct way to hook in via tests/conftest.py. surely the loading can happen there, it’s just the saving of the mapping file when the test suite is 100% is successful is the one to sort out.

So if we create a custom extension then pytest ... does nothing, but if we run pytest -doctest then it saves the mappings, and since that would then be passed via make test this will do the trick.

You understand why I suggestion to only ever update the mapping file after all tests were run successfully? Since if we don’t you need to make a much more complex logic to add/remove mapping file entries. If it’s only done in one swoop, the whole mapping file is written at once, and never so only parts of it get modified. of course on the git level that will still be the case (only some of it will get modified), but there will be a trivial implementation. a. collect the entries from test_this() b. dump them in a consistent sorted way into a mapping file - done! i.e. the mapping file is never updated, just completely re-written.

Down the road perhaps we can make it more complex, but for now we don’t want to make this into a project, it should be as simple as it can be.
for now the activation should only happen during full pytest/make test run and not individual specific test modules run. So test_this() should do nothing if it’s not a global run, i.e. return doing nothing. but for now it can just do the normal thing. see (1) for why.

But thinking more about it, the other approach is to not use an extension and to save each test_this() call into the mapping file, and code all the complexity of updating the mapping file. I guess that’s a valid way too. I just thought that doing it occasionally and in one swoop will make it easier on developers where they don’t need to monitor yet another file to commit, especially since it won’t be located under tests… not sure.
the code ideally should go with the docs gen, but for now it’s ok under test/utils. You shouldn’t have problem importing from the main code base anything because tests/conftest.py re-arranges sys.path to include ../fastai 2nd in the path. the reason it should go under docs gen is because that’s where the code using it will be - so it should be in one place. Also remember that the tests folder is not distributed to the users, so if someone looks it’d be good for them to see the whole picture together.
the map file should go somewhere under fastai/ where .py files are, so there is nothing to hardcode, as it’s always tests/…/fastai/somefile.json which is the one added to MANIFEST.in

stas · February 10, 2019, 5:10pm

Here is a quick prototype for you to work with that I built from a few SO posts:

add to tests/conftest.py:

@pytest.fixture(scope="session", autouse=True)
def start_doctest_collector(request):
    request.addfinalizer(stop_doctest_collector)
    print("\n***Initialize docset collector***")

def stop_doctest_collector():
    print("\n*** Finalize docset collector***")
    print(Helpers.x) # this is where the mapping file dump will happen.

class Helpers:
    x = {}
    @staticmethod
    def this_tests(item):
        Helpers.x[item] = 1

@pytest.fixture
def helpers():
    return Helpers

and now write a test, say tests/test_x.py:

import pytest

def test_true(helpers):
    helpers.this_tests(True)
    assert True == True

def test_false(helpers):
    helpers.this_tests(False)
    assert False == False

Run:

=========================================================================== test session starts ============================================================================
platform linux -- Python 3.7.1, pytest-4.0.2, py-1.7.0, pluggy-0.8.0 -- /home/stas/anaconda3/envs/fastai/bin/python
cachedir: .pytest_cache
rootdir: /mnt/nvme1/fast.ai-1/br/fastai/master, inifile: setup.cfg
plugins: xdist-1.25.0, repeat-0.7.0, forked-0.2, cov-2.6.0, nbsmoke-0.2.7, ipynb-1.1.1.dev0
collected 2 items                                                                                                                                                          

tests/test_x.py::test_true 
***Initialize docset collector***
PASSED
tests/test_x.py::test_false PASSED
*** Finalize docset collector***
{True: 1, False: 1}

Note that pytest buffers that initialize message, so it is called before all tests.

So now you have the pytest setup/teardown with a class attribute dict that collects the data, which you would save at the finalization stage.

So next get rid of the helpers fixture, that otherwise needs to be added to each test function (unwanted noise), so that it looks something like:

import pytest
from fastai.gen_doc.doctest import this_tests

def test_true():
    this_tests(True)
    assert True == True

This also looks potentially useful to tackle this task of avoiding a need for a fixture argument for each test: https://stackoverflow.com/a/36383912/9201239

there is also pytest cache that perhaps is easier to use than the class attribute, https://stackoverflow.com/a/50611422/9201239

and finally code test_this to the spec and we are pretty much done on the tests side.

the only other nuance to figure out is to run the collector only when pytest was given no specific test module to run (i.e. all tests are run). ok, here is how you’d do it, basically peeking into argv to see if specific test modules are passed (will need to tweak if the order of args is different, currently it’s inflexible but is good enough for a start)

@pytest.fixture(scope="session", autouse=True)
def start_doctest_collector(request):
    print("\n***Initialize docset collector***")
    print(sys.argv)
    if 'test_' in sys.argv[-1]:
        print("Specific tests are running, not collecting doctests")
    else:
        print("All tests are running")
        request.addfinalizer(stop_doctest_collector)

perhaps there is an easier way, but at the first search I couldn’t find one.

If you’re stuck, first check out SO (stackoverflow) since most questions have been answered already there. but use google to search it - gives much better results. This is how I scraped the code above, as I haven’t done this kind of problem before - took a few minutes to put a few answers together.

Benudek · February 10, 2019, 10:41pm

Yes, that’s fine. So will definitely do the google stackoverflow tour. Just thinking it’s good to coordinate the PR in case you had specific solution in mind and avoid the PR would not go through.i

stas · February 15, 2019, 8:12pm

8 posts were merged into an existing topic: Improving/Expanding Functional Tests

Benudek · February 11, 2019, 6:33pm

@stas, coded up (should have checked simply the pytest doc for sharing context between test classes!) along your comments. Only wasn’t sure, what you mean here:

do you mean doc_tests or to write a separate test for the functionality I will add?

Could make a PR for you to check, but I guess better I 1st look at doc_test and see how this all works in combination.

2 more remarks:

Using the namespace helper

to avoid copying imports in all test files later on. So, I suppose we will need to trigger installing this when submitting the PR also

The cache

looks good, but afaict we would need to pass around the request object in all test methods - lots of copy & code and class variable should just be fine.

thx for sharing your knowledge again, good learning curve here for me.

stas · February 11, 2019, 6:40pm

sorry, that was a typo - fixed now in the original. ‘test_this’

to avoid copying imports in all test files later on. So, I suppose we will need to trigger installing this when submitting the PR also

looks good, but afaict we would need to pass around the request object in all test methods - lots of copy & code and class variable should just be fine.

I meant that we don’t need to edit any tests other than adding a single line in each test of:

test_this(...)

and most likely we will have to do at the top of each test module:

from foo import test_this

I was trying to find a way how to make this function globally available in all test modules w/o needing to import it everywhere, but I couldn’t find a way to do that. I’m new to python so perhaps there is some hack that can do that. It’d be nice not to need to import that function everywhere. Python seems to be very non-hackable (for good reasons most of the time, but not always).

I hope that clarifies things.

Benudek · February 11, 2019, 6:46pm

yes, cool - thx ! and the tip you shared does work without imports:

Just this line on a test class:

pytest.helpers.this_tests(Learner.lr_find)

After declaring it in conftest.py

@pytest.helpers.register
def this_tests(testedapi):
RegisterTestsperAPI.this_tests(testedapi)

and installing the package

So, thats quite useful as it avoids too much copying code!

Benudek · February 11, 2019, 6:48pm

seems more like you are pretty much ahead of the game in python to me I am rather new to python but did my fair share of programming ( but python so far is the most fun programming experience!)

stas · February 11, 2019, 6:56pm

yes, but as I said, we usually don’t want extra deps. I think the best approach is to port it so it’s self-contained in the test suite (conftest.py), but otherwise it’s an extra dependency which we typically try to avoid - dependencies often tend to break or disappear.

And i’d shorten it to something shorter, pytest.g.this_test if it’s sensible, may be not, open to suggestions. g for global?

Benudek · February 11, 2019, 7:04pm

Its fine to do the copy paste also in all test scripts:

from fastai.gen_doc.doctest import this_tests

Because all scrips will need to be touched anyways

I dont know what the package does under the hood indeed, so might be risky introducing the dependency.

stas · February 11, 2019, 7:04pm

well, once you know programming in one language, switching to a new programming language is like learning to speak a new language, i.e. relatively straightforward and mostly requiring to learn vocab/grammar. It’s figuring out the idioms that is hard and it takes a long time, and usually you can’t learn it from books - you need to expose yourself to situations where those are used/needed.

Each programming language usually has straightforward easy things loop/branch/etc., and then hard things that require understanding the language internals (frames/gc/namespaces/etc.). some languages make easy things easy and hard thing possible, other languages make hard things hard or not always doable. But since I’m new to python, I won’t pass any judgements yet, until I master it better.

That’s why I asked whether perhaps someone knows how to make a function behave like a built-in. So far I didn’t find a way. It seems that import and fully qualified functions are the only way to go.

stas · February 11, 2019, 7:06pm

sure, let’s keep it simple then and just import that function.

should we change our mind later it should be an easy search/replace across all tests to use a different way.

thank for the feedback and extra research.

ashaw · February 12, 2019, 1:38am

@stas @Benudek sorry I finally got to the doctest side of things today.
PR: https://github.com/fastai/fastai/pull/1620

Here’s a quick demo of what I have so far:

github.com

bearpelican/fastai/blob/doctest/DocTestDemo.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fastai.gen_doc.nbdoctest import show_doctest, doctest"

This file has been truncated. show original

Playground notebook here: https://github.com/bearpelican/fastai/blob/doctest/Creating_doctest.ipynb

I added some fuzzy matching functionality to search through the tests folder and places the function is called.
This is just in case nothing is returned from this_tests

Also returning a skeleton test function. If you guys think it’s a good idea, we could expand on this and add asserts and fill in variables from the function parameters.

Hope it’s along the lines of what you two were thinking. I can continue iterating

stas · February 12, 2019, 2:32am

That’s a brilliant idea. I didn’t think of it. I guess we just need to control the number of hits with auto-discovery, since some functions might be used in 50% of all tests.

Also returning a skeleton test function. If you guys think it’s a good idea, we could expand on this and add asserts and fill in variables from the function parameters.

That one I’m not sure what to say, if the skeleton contains just the name of the function, I’m not sure how much of a headstart that provides. Perhaps for some people having a skeleton may trigger an incentive to fill the void? In which case I’m all for it. Nothing to lose.

Should we call the main function show_test? so we will have show_doc and show_test?

And I think other than you coming up with all kinds of cool ways making this little project even cooler we still need to know in what format do you want test_this() to fuel show_test to bridge the two sides? it can be a plain file, json, or any other format that you think will be the easiest.

Benudek · February 12, 2019, 7:01am

@ashaw @stas find here the json file I generated with the code snippets, see below: Idea is per API we will have a list of tests, so several tests could be registered per API.

This is generated with json.dump, looks a little odd when opening the file. So wanted to write doc_test and see how this all works.

But lets coordinate this here … what ever file makes sense. We do not necessarily need json also.

public github doc with relevant prorotype files: https://github.com/Benudek/fastaidoctestexchange

File conftest.json: https://github.com/Benudek/fastaidoctestexchange/blob/master/conftest.py
function is called after all tests are finished:

@pytest.fixture(scope="session", autouse=True)
def start_doctest_collector(request):
    matching = [s for s in set(sys.argv) if "test_" in s]
    if not matching:
        request.addfinalizer(stop_doctest_collector)
   
def stop_doctest_collector():
    encoded_map = json.dumps(RegisterTestsperAPI.apiTestsMap, indent=2, default=set_default)
    fastai_dir = abspath(join(dirname( __file__ ), '..', 'fastai'))
    with open(fastai_dir + '/TestAPIRegister.json', 'w') as f:
        json.dump(obj=encoded_map,fp=f, indent = 4, sort_keys= True)

File fastai.gen_doc.doctest: https://github.com/Benudek/fastaidoctestexchange/blob/master/doctest.py

class RegisterTestsperAPI:
    apiTestsMap = dict()
    @staticmethod
    def this_tests(testedapi):
       
        previous_frame = inspect.currentframe().f_back.f_back ##.f_back 
        (filename, line_number, test_function_name, lines, index) = inspect.getframeinfo(previous_frame)
        list_test = [{'file: '+ filename, 'test: ' + test_function_name}]
               fq_apiname = full_name_with_qualname(testedapi)
        if(fq_apiname in RegisterTestsperAPI.apiTestsMap):
            RegisterTestsperAPI.apiTestsMap[fq_apiname] = RegisterTestsperAPI.apiTestsMap[fq_apiname]  + list_test
        else:
            RegisterTestsperAPI.apiTestsMap[fq_apiname] =  list_test

Benudek · February 12, 2019, 7:21am

think its a great idea, we link the git area from the repo how to contribute and give 1-2sentences on guidelines to write a test. But what do you mean with ‘asserts and fill in variables from the function parameters?’ You mean we can further kind of generate skeletons? That would be cool.

we might want to add a parameter Direct / Ancillary Tests and Skeleton to the file TestAPIRegister.json I generate, if we let the discovery also come up with this. But then we also need this when registering with this_tests. So maybe we leave the categories to the dicovery and per default in this_tests only generate tests of category Direct.

stas · February 12, 2019, 6:04pm

Good, except please drop “file: /Users/bherudek/Desktop/fastaigits/fastai-doctest/tests/” from that file. It has to be relative to tests/ and not hardcoding paths which will be different for each person.

wrt:

matching = [s for s in set(sys.argv) if "test_" in s]

this:
if "test_" in sys.argv: ...

will do the same

But I’m concerned that someone may have ‘test_’ in their path, prior to fastai/tests/test_ and also we may run it as:

cd tests
pytest test_...

so matching on tests/test_ won’t work either.

Therefore I think we need to use a regex with your original code instead and match for r'test_\w+\.py'.

and FYI, I will not be weighing in on any other suggestions to do more until this simple task is finished and integrated. But, of course, feel free to discuss and brainstorm, but please don’t @ me. Thank you.

ashaw · February 12, 2019, 8:39pm

@Benudek can you also include the line number on where this_test happens?
Either the line number of the parent function or the line number of this_test is fine.