[nbdev] How to implement time consuming tests for things like DataLoaders, Learner.fit, etc

wgpubs · July 13, 2020, 9:01pm

I’m trying to add some tests in for my fastai+huggingface library here.

So far I’ve added tests for all sequence classification transformer models here (DataBlock) and here (training). Basically, all I’m trying to do is ensure that all huggingface SequenceClassification models work for the sequence classification code I’ve built.

I’m sure there is a better and more efficient way to add real tests in here without having the CI take 15-20 minutes because it has to download all the huggingface bits each time or run through a full epoch in fit. Any recommendations/best practices?

Also, is there a way to add those nice buttons to my repo that show at a glance if CI is passing or not?

Thanks - wg

muellerzr · July 13, 2020, 9:07pm

IIRC there’s a #slow tag you can use, so you can simply run the tests yourself when you want but the CLI will skip them (not quite what you want, just what I know of though )

wgpubs · July 13, 2020, 11:11pm

That’s what I’m doing now. Unfortunately it means it won’t be part of the CI.

pete88b · July 14, 2020, 12:30pm

I’ve got a few ideas that might be a good fit for your project - I don’t know your project well enough to make recommendations but I hope this helps (o:

Is there any way to cache models to avoid waiting for download? Are these models too big to be in your github repo?
I think learn.fit_one_cycle(1, lr_max=1e-3) is currently taking 50s? Maybe you could use just a few rows of ‘texts.csv’ and fit for just 1 or 2 iterations?
If you have enough CPUs/GPUs available, it might help to run these tests in parallel
- easiest way would be to split tests into multiple notebooks and let nbdev parallelize for you
- but you could write your own ProcessPoolExecutor code - see: nbdev.imports#parallel
You could write a test that picks a few models at random - so you’re testing a different set of models with every CI run
- think you’d want to keep the options to run them all with the #slow flag too

Could I also suggest (o: that you don’t suppress failures in your test code. Wrapping you tests in try/except like this

    try:
        test ...
        print('--- PASSED: Batch inputs/targets ---\n')
    except:
        print('--- FAILED: Batch inputs/targets ---\n')

means you won’t know that your tests have failed unless you manually check the output - you want test steps that fail to raise errors so that the test runner/CI framework knows something failed.

wgpubs · July 14, 2020, 5:55pm

Good question … I’m not sure I really want to do this as the repo would get huge, making commits slow.

I’m going to see if I can just limit it to maybe 4-5 batches. Not sure if there is something in the library/method … but that is a good recommendation.

I’m using nbdev and not sure how much I can customize the CI mechanism as I haven’t really looked into it much. Its more of a black box for me right now, but something to get into perhaps if needed.

Yah, that is how I do it now. It may be what I have to live with … I was just trying to see if there is a way to make this fit with the automated testing capabilities of nbdev.

I hear that, the thing is I want to run through all the config/models and see what is working and not working. Without the try/except, it doesn’t run through all the models.

pete88b · July 14, 2020, 7:10pm

You can stop training early with a callback. e.g.

class ShortEpochCallback(Callback):
    def begin_batch(self): 
        if self.learn.iter > 4: raise CancelFitException()

learn.fit_one_cycle(1, lr_max=1e-3, cbs=ShortEpochCallback())

… but I was suggesting you use just a few rows from IMDB_SAMPLE. e.g.
dls = dblock.dataloaders(imdb_df[:4], bs=4)
This will train for 1 iteration - using just 4 rows. Using the full 1000 rows means you do lots of iterations/batches.