Test Registry project

Benudek · February 11, 2019, 7:04pm

Its fine to do the copy paste also in all test scripts:

from fastai.gen_doc.doctest import this_tests

Because all scrips will need to be touched anyways

I dont know what the package does under the hood indeed, so might be risky introducing the dependency.

stas · February 11, 2019, 7:04pm

well, once you know programming in one language, switching to a new programming language is like learning to speak a new language, i.e. relatively straightforward and mostly requiring to learn vocab/grammar. It’s figuring out the idioms that is hard and it takes a long time, and usually you can’t learn it from books - you need to expose yourself to situations where those are used/needed.

Each programming language usually has straightforward easy things loop/branch/etc., and then hard things that require understanding the language internals (frames/gc/namespaces/etc.). some languages make easy things easy and hard thing possible, other languages make hard things hard or not always doable. But since I’m new to python, I won’t pass any judgements yet, until I master it better.

That’s why I asked whether perhaps someone knows how to make a function behave like a built-in. So far I didn’t find a way. It seems that import and fully qualified functions are the only way to go.

stas · February 11, 2019, 7:06pm

sure, let’s keep it simple then and just import that function.

should we change our mind later it should be an easy search/replace across all tests to use a different way.

thank for the feedback and extra research.

ashaw · February 12, 2019, 1:38am

@stas @Benudek sorry I finally got to the doctest side of things today.
PR: https://github.com/fastai/fastai/pull/1620

Here’s a quick demo of what I have so far:

github.com

bearpelican/fastai/blob/doctest/DocTestDemo.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fastai.gen_doc.nbdoctest import show_doctest, doctest"

This file has been truncated. show original

Playground notebook here: https://github.com/bearpelican/fastai/blob/doctest/Creating_doctest.ipynb

I added some fuzzy matching functionality to search through the tests folder and places the function is called.
This is just in case nothing is returned from this_tests

Also returning a skeleton test function. If you guys think it’s a good idea, we could expand on this and add asserts and fill in variables from the function parameters.

Hope it’s along the lines of what you two were thinking. I can continue iterating

stas · February 12, 2019, 2:32am

That’s a brilliant idea. I didn’t think of it. I guess we just need to control the number of hits with auto-discovery, since some functions might be used in 50% of all tests.

Also returning a skeleton test function. If you guys think it’s a good idea, we could expand on this and add asserts and fill in variables from the function parameters.

That one I’m not sure what to say, if the skeleton contains just the name of the function, I’m not sure how much of a headstart that provides. Perhaps for some people having a skeleton may trigger an incentive to fill the void? In which case I’m all for it. Nothing to lose.

Should we call the main function show_test? so we will have show_doc and show_test?

And I think other than you coming up with all kinds of cool ways making this little project even cooler we still need to know in what format do you want test_this() to fuel show_test to bridge the two sides? it can be a plain file, json, or any other format that you think will be the easiest.

Benudek · February 12, 2019, 7:01am

@ashaw @stas find here the json file I generated with the code snippets, see below: Idea is per API we will have a list of tests, so several tests could be registered per API.

This is generated with json.dump, looks a little odd when opening the file. So wanted to write doc_test and see how this all works.

But lets coordinate this here … what ever file makes sense. We do not necessarily need json also.

public github doc with relevant prorotype files: https://github.com/Benudek/fastaidoctestexchange

File conftest.json: https://github.com/Benudek/fastaidoctestexchange/blob/master/conftest.py
function is called after all tests are finished:

@pytest.fixture(scope="session", autouse=True)
def start_doctest_collector(request):
    matching = [s for s in set(sys.argv) if "test_" in s]
    if not matching:
        request.addfinalizer(stop_doctest_collector)
   
def stop_doctest_collector():
    encoded_map = json.dumps(RegisterTestsperAPI.apiTestsMap, indent=2, default=set_default)
    fastai_dir = abspath(join(dirname( __file__ ), '..', 'fastai'))
    with open(fastai_dir + '/TestAPIRegister.json', 'w') as f:
        json.dump(obj=encoded_map,fp=f, indent = 4, sort_keys= True)

File fastai.gen_doc.doctest: https://github.com/Benudek/fastaidoctestexchange/blob/master/doctest.py

class RegisterTestsperAPI:
    apiTestsMap = dict()
    @staticmethod
    def this_tests(testedapi):
       
        previous_frame = inspect.currentframe().f_back.f_back ##.f_back 
        (filename, line_number, test_function_name, lines, index) = inspect.getframeinfo(previous_frame)
        list_test = [{'file: '+ filename, 'test: ' + test_function_name}]
               fq_apiname = full_name_with_qualname(testedapi)
        if(fq_apiname in RegisterTestsperAPI.apiTestsMap):
            RegisterTestsperAPI.apiTestsMap[fq_apiname] = RegisterTestsperAPI.apiTestsMap[fq_apiname]  + list_test
        else:
            RegisterTestsperAPI.apiTestsMap[fq_apiname] =  list_test

Benudek · February 12, 2019, 7:21am

think its a great idea, we link the git area from the repo how to contribute and give 1-2sentences on guidelines to write a test. But what do you mean with ‘asserts and fill in variables from the function parameters?’ You mean we can further kind of generate skeletons? That would be cool.

we might want to add a parameter Direct / Ancillary Tests and Skeleton to the file TestAPIRegister.json I generate, if we let the discovery also come up with this. But then we also need this when registering with this_tests. So maybe we leave the categories to the dicovery and per default in this_tests only generate tests of category Direct.

stas · February 12, 2019, 6:04pm

Good, except please drop “file: /Users/bherudek/Desktop/fastaigits/fastai-doctest/tests/” from that file. It has to be relative to tests/ and not hardcoding paths which will be different for each person.

wrt:

matching = [s for s in set(sys.argv) if "test_" in s]

this:
if "test_" in sys.argv: ...

will do the same

But I’m concerned that someone may have ‘test_’ in their path, prior to fastai/tests/test_ and also we may run it as:

cd tests
pytest test_...

so matching on tests/test_ won’t work either.

Therefore I think we need to use a regex with your original code instead and match for r'test_\w+\.py'.

and FYI, I will not be weighing in on any other suggestions to do more until this simple task is finished and integrated. But, of course, feel free to discuss and brainstorm, but please don’t @ me. Thank you.

ashaw · February 12, 2019, 8:39pm

@Benudek can you also include the line number on where this_test happens?
Either the line number of the parent function or the line number of this_test is fine.

Benudek · February 12, 2019, 8:40pm

you mean in the file? sure - lets be specific, just working on this. on what position e.g. do you want it?

ashaw · February 12, 2019, 8:42pm

Yup the line number in the file - so we can link to the github source. Line number of the def test_ function please!

Benudek · February 12, 2019, 8:47pm

ok, I get line number 18 with inspect. But I think we can per convention say this_tests is always the 1st line. Hence can give 18 - 1.

17: def test_lr_find(learn, request):
18: this_tests(Learner.lr_find)

ashaw · February 12, 2019, 8:51pm

I like the guidelines part - if no tests show up, we can display the skeleton and guideline.
Further down the line - I was thinking of similar to tools/build-docs - generate a skeleton test.py file or append the test function if the file already exists. We could fill in as much as possible since we know the parameter types and the return type

ashaw · February 12, 2019, 8:55pm

Cool works for me! Current format looks good to me.
One minor change - can the metadata be a dictionary instead of array?
fastai.train.lr_find’: [ “file: /Users/bherudek/Desktop/fastaigits/fastai-doctest/tests/test_train.py” ]

Something like:
fastai.train.lr_find’: { “file”: “fastai-doctest/tests/test_train.py” }

stas · February 12, 2019, 8:59pm

That won’t always be the case, please refer to the spec - it could be called anywhere in the script. So you need to lookup the function it’s called from (def …) and get the line number from its data.

Benudek · February 12, 2019, 9:02pm

ok, will check

will check, ok

ashaw · February 12, 2019, 9:24pm

I’m doing something similar to find the parent function here:

github.com

bearpelican/fastai/blob/doctest/fastai/gen_doc/nbdoctest.py#L92


return result


def _fuzzy_line_match(elt, lines):
fn_name = nbdoc.fn_name(elt)
result = []
for idx,line in enumerate(lines):
    if re.match(f'.*[\s\.\(]{fn_name}[\.\(]', line):
        result.append((idx,line))
return result


def _get_parent_func(lineno, lines):
for idx,l in enumerate(reversed(lines[:lineno])):
    if re.match(f'^def test', l):
        return (lineno - (idx+1)), l
return None


def fuzzy_match(elt, lines):
fuzzy_line_matches = _fuzzy_line_match(elt, lines)
fuzzy_matches = [_get_parent_func(lineno, lines) for lineno,line in fuzzy_line_matches]
fuzzy_matches = list(filter(None.__ne__, fuzzy_matches))
return fuzzy_matches

If it’s easier, I can do the conversion instead

Benudek · February 12, 2019, 9:27pm

I am going through the frames, so far no luck but should be possible.

Not sure how you want to do it - so I give you the name and you find the line this way?

stas · February 12, 2019, 9:32pm

No, he means you can borrow the solution from his code, since he’s doing the same

Benudek · February 12, 2019, 9:58pm

@ashaw uploaded adjusted doctest,

using your function to get the parent function and lineno (btw the lineno is e.g. 16 where the function starts at 17 - I guess you are aware and count e.g. with 0) Surprised I couldn’t solve this with the frame, but guess there is a reason you wrote that function ;-).
haven’t changed that array yet: For one API we might (probably will) have several tests - at least it is a logical possibility and wouldn’t want to overwrite than other tests. So, arrays is fine?

one small change in conftest following @stas comment on regex

To Dos, provided this works for you:

consider: get parent func to common util (or I import your class), so far copied the func
make one common PR
then, can start registering tests with this_tests