Improving/Expanding Functional Tests


(benedikt herudek) #1

This Topic is intended to surface gaps in automated test & coordinate work around that.

The goal is to add functionality tests, using coverage as an indicator for which areas haven’t been tested (and not to just try to increase coverage).


Important note on tests:

  • they must be smart: improving coverage just for the sake of coverage isn’t a goal. We also run the full doc notebooks as a test before a release, which has probably a really good coverage, but it doesn’t necessarily catch subtle bugs
  • they must be stylish: tests should abide by the style guide of fastai
  • they must be fast: no downloading torchvision models, no training apart from the 4 current integration tests test_bla_train (one in vision, one in text, one in tabular and one in collab) or be marked as slow (but we’d prefer fast tests).
  • most should be fake: other than the 4 integration tests in test_bla_train, we always use fake_learner in tests/fakes.py, where you also find methods for creating fake data

The simplest way is for you to just submit PRs (https://docs.fast.ai/dev/git.html) with new tests, following the guidelines here

Note, at the time you are reading this, the specific coverage %age in below table will likely be outdated. The suggested process, therefore, is to run re-run:

make coverage

or

coverage report -m

to create a html report / find specific lines and then update the numbers. Add your name & potentially update the column comments and the corresponding testclasses you want to work on. With some coordination, it should be possible several developers work on one class.

Find additional information on testing here:

Writing tests is an excellent way for coders to understand a code base.

Thanks a lot for helping!

Incoming tasks to be sorted

  • need some speed regression tests to make sure the core isn’t getting slower (timeit kind of tests)

Remark: there is a thread open for benchmarking: Benchmarking fastai . At some point later, these 2 efforts should get synced and the test effort get enriched by benchmarking and performance tests.

Sorted tasks

Note:
* Tasks are listed in order of priority, pls check with coordinators, when picking a task

* In case a task you like to work on is already picked, pls check with the person - help welcome
* each module should be tested in a test file name with a corresponding name: like test_vision_transform for vision.transform

Module Test Classes Status Comments Change Required Developers
basic_train test_basic_train ongoing follow test_callback.py test_callbacks_fit Benudek
basic_data test_basic_data Done (mostly ;-)) see some TO DOS in code comments follow example in test_callbacks_csv_logger Benudek
train test_train.py ongoing follow test_callback.py test_callbacks_fit Benudek
tabular/data test_tabular_data Assigned follow example in test_callbacks_csv_logger Benudek
layers test_layers ongoing follow example in test_callbacks_csv_logger Benudek
callback test_callback ( add test_callbacks_xxx as appropriate) Assigned follow example in test_callbacks_csv_logger Stas / Benudek
tests/test_metrics.py Assigned Stas
tests/test_lr_finder.py Assigned connected to test_train.py Benudek
vision/gan.py add Unassigned add new test class
vision/models/unet.py add Unassigned
vision/learner.py add? Unassigned
/text/learner.py test_text_data / test_text_train ? Unassigned corresponding test class?
/callbacks/loss_metrics.py test_callbacks_hooks / test_callback_fp16? Unassigned corresponding test class? test_callbacks_hooks or test_callback_fp16?
/callbacks/general_sched.py Unassigned see fastai / callbacks / loss_metrics.py
/text/transform.py Unassigned corresponding test class?
/collab.py test_collab_train? Unassigned corresponding test class, test_collab_train?
/tabular/transform.py test_tabular_transform Unassigned corresponding test class: test_tabular_transform but no class test_text_transform?
/metrics.py test_metrics Unassigned
/vision/data.py test_vision_data Unassigned corresponding test class?
/data_block.py test_vision_data_block Unassigned what is the corresponding test class, should there be one dedicated or is this tested indirectly via other classes?
/vision/transform.py test_vision_transform Unassigned
/datasets.py test_datasets Unassigned
/vision/image.py test_vision_image Unassigned
/callbacks/hooks.py test_callbacks_hooks Unassigned see fastai/callbacks / loss_metrics.py / fastai/callbacks/general_sched.py
vision/cyclegan.py not ready add new test class
/text/qrnn/forget_mult.py not ready
/text/qrnn/qrnn.py not ready
/text / qrnn/qrnn1.py not ready

Dev Projects Index
Developer chat
Developer chat
Benchmarking fastai
(benedikt herudek) #2

agree with asserts matter … for show_batch however I have no good idea how to assert over it and made a PR with some comments for review.


Developer chat
(Stas Bekman) #3

See an example here: https://github.com/fastai/fastai/blob/master/tests/test_utils.py#L9


(benedikt herudek) #4

am fine to have my name with the tasks @stas but I want to be mindful of the fact, that I can help a little over my free days over christmas and then will work full time again :wink: Anyways, doing this for fun and to learn, so happy to try when I find time


(Stas Bekman) #5

I wasn’t sure why you chose to volunteer me as a coordinator in first place :wink: it helps to ask.

Thank you for wanting to lead this effort.

FYI, your main contact will be with @sgugger and @jeremy, who wrote 99% of the library and know it the best to tell whether the proposed tests are good ones or not.

p.s. I moved the last few posts on this topic to the thread you created where they belong.


#6

Important note on tests:

  • they must be smart: improving coverage just for the sake of coverage isn’t a goal. We also run the full doc notebooks as a test before a release, which has probably a really good coverage, but it doesn’t necessarily catch subtle bugs
  • they must be fast: no downloading torchvision models, no training apart from the 4 current integration tests (one in vision, one in text, one in tabular and one in collab) or be marked as slow (but we’d prefer fast tests).
  • they should abide by the style guide of fastai

Thanks a lot for helping!


(benedikt herudek) #8

Ah sure, yes sorry. Volunteering others is indeed dodgy & buggy :slight_smile:


(Stas Bekman) #10

@sgugger, I copied your important notes to the first post (wiki), thank you for this important prose!

Everybody, please make any important suggestions in the first post, since otherwise they get lost in the flurry of comments. Thank you!

I’m not sure we need the coverage pasted there - it’s huge and will be outdated quickly anyway. Maybe a better approach is to handpick areas that need most work?

I’m just brainstorming aloud, just ideas…


(benedikt herudek) #11

Yes, had copied @sgugger notes @stas, so took off the double mention (my footnote) :wink:

I added a new table, see above, to identify and coordinate what needs to be done. Could you review that wrt prios and to do’s? Goal is developers can pick and (1) work on relevant areas with (2) the right approach and (3) adding test code to the right classes.

The process I followed:

  • copy the table to excel
  • add column prio
  • sort for a mix of missing coverage and classes that seem relevant
  • add to do’s, comments & corresponding test classes in respective columns
  • stopped at 21 prios for adding tests (maybe we can do even less like 10 or 15 and then reassess)

Am sure with your review the prios and to do’s can improve a lot. After that, would take out some lines & columns (e.g. the quickly changing coverage and lines not covered), so this can all be read on one page without too much scrolling.

After we have prioritised and well-defined work packages we could ask around on the forum, who wants to be involved. Should be a quick enough task to solve with many technical attendees of the MOOC, provided it is properly planned out

thx

Ben


#12

I’ve removed some classes that are too early in their development stage to be included in the efforts for test. Let me think a little bit more about the approach to improve the unit tests.


(Jeremy Howard (Admin)) #13

Thanks so much for this initiative @Benudek! :slight_smile: Just a quick note to mention that @Sylvain and @sgugger are different people - so we should tag the latter :slight_smile:


(benedikt herudek) #14

thx !


(Stas Bekman) #15

Note, that with pytest-ipynb, it’s possible to add .ipynb files to the normal test suite. I have just written my first such test which runs via a normal pytest. https://github.com/stas00/ipyexperiments/blob/master/tests/test_cpu.ipynb. Basically, you just write a normal notebook with asserts, and it just runs it handling any assert failures.

I’m not sure whether we have any functionality that requires jupyter env to be tested, perhaps fastprogress? But I thought I’d share this useful tool.

if we start using it we will need to make a conda package of pytest-ipynb available on the fastai channel. Currently, there are only a few private channels that offer it: https://anaconda.org/search?q=pytest-ipynb


#16

I’ve changed a little bit the test structure. Please look at the empty test files, they should be prioritized. Then, if you want to do a mock training for test or get a mock DataBunch, please use the functions in fake (I’ve written an example in the test_callbacks_csv_logger file).


(benedikt herudek) #17

good stuff. could be nice, as this way one could document tests and make them more accessible. but also quite some work to migrate tests I suppose


(benedikt herudek) #18

updated the table accordingly, thx for the example


#19

Also, I forgot to say, but each module should be tested in a test file name with a corresponding name: like test_vision_transform for vision.transform. For instance I didn’t create all the test_callbacks_xxx files, but they should be created if someone adds tests for calbacks.xxx
test_bla_train are the four integration tests in vision, text, tabular and collab, and those should contain any test that require a real training (since in all other tests, only a fake_learner should be used).


(benedikt herudek) #20

ok, will update this above.

looking at your class fakes.py this line: train_ds = TensorDataset(torch.randn(train_length, n_in), torch.randn(train_length, n_out)) and wondering if this way of creating data will allow us to test all functionality (saw a maybe related comment of yours here: Developer chat).

Trying testing the class basic_data / DataBunch / one_item and getting this error

AttributeError: ‘TensorDataset’ object has no attribute 'one_item’

Might be related to the way the data is created - maybe we need to create the fakes stub a little different?


#21

Ah yes, I was a bit too quick. I’ll fix this at some point (travelling today so it may be tomorrow or the day after).


(Stas Bekman) #23

To clarify, I wasn’t suggesting to migrate tests to ipynb. I shared that for the situation where the code needs to be tested in that environment. For example, https://github.com/stas00/ipyexperiments/ can only be run in the ipython/jupyter env - so there is no choice but to test in that environment.