Developer chat

Where would be a good place to discuss test coverage, e.g. as a subtopic in: Dev Projects Index ?

Made a simple test coverage for basic_data.show_batch in test_text_data.py. Motivation initially was I could try to reproduce this bug here: https://forums.fast.ai/t/getting-cuda-oom-on-lesson3-imdb-notebook-with-a-bs-8/30080/14 . But I could already make a PR of the test scripts, even though, there are no asserts: its simply coverage for this method as a regression test that it doesn’t throw exceptions.

After running make coverage, I can see other areas, where test scripts could be useful. Am aware of the general guidelines (https://docs.fast.ai/dev/test.html), but maybe good to have one place to coordinate, set prios and have guidelines when asserts matter (or not)? Might help preventing non - accepted PRs.

@stas

1 Like

Coverage for the sake of coverage is mostly meaningless unless it verifies the workings, but tests for the functionality that hasn’t been tested yet are very welcome.

In general you can just submit such test contributions directly via github, but if you’d like to make a study and perhaps compile a list of functionality that’s untested yet, then yes, making a dedicated to testing thread under https://forums.fast.ai/c/fastai-users/dev-projects and linking to it from Dev Projects would be a very useful contribution, @Benudek. There is one for Documentation improvements so that would be one for Testing Improvements or something like that.

1 Like

drafted a task wrt testing here: Test Asserts & Coverage and linked it here: Dev Projects Index

Feel free to adjust task & process

1 Like

4 posts were merged into an existing topic: Improving/Expanding Tests

I was going through lesson5-sgd-mnist.ipynb and noticed that pathlib’s Path does not get imported:

I did not think much of it and did

from fastai.vision import *

And got past the error. However, a little later in the same notebook, I am getting this error:

data = DataBunch.create(train_ds, valid_ds, bs=bs)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-16801c71829f> in <module>
      1 
----> 2 data = DataBunch.create(train_ds, valid_ds, bs=bs)

~/git/fastai/fastai/basic_data.py in create(cls, train_ds, valid_ds, test_ds, path, bs, num_workers, tfms, device, collate_fn, no_check)
    112                collate_fn:Callable=data_collate, no_check:bool=False)->'DataBunch':
    113         "Create a `DataBunch` from `train_ds`, `valid_ds` and maybe `test_ds` with a batch size of `bs`."
--> 114         datasets = cls._init_ds(train_ds, valid_ds, test_ds)
    115         val_bs = bs
    116         dls = [DataLoader(d, b, shuffle=s, drop_last=(s and b>1), num_workers=num_workers) for d,b,s in

~/git/fastai/fastai/basic_data.py in _init_ds(train_ds, valid_ds, test_ds)
    102     @staticmethod
    103     def _init_ds(train_ds:Dataset, valid_ds:Dataset, test_ds:Optional[Dataset]=None):
--> 104         fix_ds = valid_ds.new(train_ds.x, train_ds.y) # train_ds, but without training tfms
    105         datasets = [train_ds,valid_ds,fix_ds]
    106         if test_ds is not None: datasets.append(test_ds)

AttributeError: 'TensorDataset' object has no attribute 'new'

I am on commit ce85b4718865f25d0243042c0e4d051929ca3b52 and not seeing anything obvious.

Does this ring a bell to anybody?

we now have a pillow conda package built against a custom build of libjpeg-turbo (and libtiff w/ libjpeg-turbo) in the fastai test channel

I made 3.6 and 3.7 linux builds:

To install:

conda uninstall -y pillow libjpeg-turbo
conda install -c fastai/label/test pillow 

This will make your jpeg decompressing much faster (See Performance Improvement Through Faster Software Components).

Note that this is pillow-5.4.0.dev0 build - i.e. don’t use in production w/o testing.

I also uploaded pillow-simd-5.3.0.post0 w/ libjpeg-turbo and built w/ avx2 - only py36.

conda uninstall -y pillow libjpeg-turbo
conda install -c fastai/label/test pillow-simd

note that it’d get overwritten by pillow through update/install of any other package depending on pillow.

2 Likes

Many apologies @hiromi - I made a change in master recently that removes the need for from fastai import * any time you use an application (e.g. from fastai.vision import *). However, if you aren’t using an application, you need from fastai.basics import *. I’ve fixed all the notebooks now (I hope!)

The 2nd bug you came across is because I added a new fix_dl attribute that provides the training dataset, but without shuffling or augmentation. It only works with fastai ItemLists however, not with generic Datasets. So I’ve fixed that now (in master) to skip creating it for generic datasets.

when i make git pull it tries to switch to “https://github.com/fastai/fastai/tree/devforfu-fastai-make-pr-branch-to-python” the new developer branch ?

is this the new dev branch ?

Thank you so much for the detailed explanation!

I was hoping that I can spot what has changed and fix it somehow, but still a bit slow to get my bearings. Maybe next time :slight_smile:

1 Like

Breaking change: In all the text application, batch is now the first dimension (and sequence length the second dimension). It won’t impact you if you’re using the fastai models/databunch but if you were customizing things, they may need to be tweaked.

@piotr.czapla Tagging you as a warning, even if I know you’re not using master.

3 Likes

Thanks for the heads up, @Kaspar. I may have done something wrong while trying to add a change to a PR, I will be extra careful next time. I deleted that branch as it has been merged, so you probably shouldn’t have a problem now.

1 Like

dear fellow developers: consider grabbing some test scripts

… if I may suggest :wink:

1 Like

Is there a tread to report bugs in, so we can discuss them before submitting a new issue on github?

Or should I create a whole new thread for any detected bugs?

1 Like

If you think it’s a bug in fastai you can submit it directly as a github issue, but you can also post here about it if you’re not sure.

1 Like

just a heads up. I have created this issue [https://github.com/fastai/fastai/issues/1378] with ref to a notebook on how to fix the memory overhead in LanguageModelLoader. The proposed version uses less than 5% of the current version. Testing and validation of accuracy on the english corpus is still need.

Would love some feedback

3 Likes

If you call learn.fit twice it will wrap learn.model twice. Not an issue on runtime, but problem if you try save and load model. May I propose unwrap model in the on_train_end method?

What you guys think about this code piece.

def cont_cat_split(df, max_card=20, dep_var=None):
"""
Parameters:
-----------
df: A pandas data frame, that you wish to take columns.
max_card: Maximum cardinality of a continuous variable.
dep_var: A dependent variable.

Returns:
-----------
cont_names: A list of names of continuous variables.
cat_names: A list of names of categorical variables.

Examples:
-----------
>>> df = pd.DataFrame({'col1' : [1, 2, 3], 'col2' : ['a', 'b', 'a'], 'col3' : [0.5, 1.2, 7.5], 'col4' : ['ab', 'ab', 'o']})
>>> df
   col1 col2 col3 col4
0     1    a  0.5   ab
1     2    b  1.2   ab
2     3    a  7.5    o

>>> cont_cat_split(df, 20, 'col4')
(['col3'], ['col1', 'col2'])
"""
cont_names, cat_names = [], []
for label in df:
    if label == dep_var: continue
    if len(set(df[label])) > max_card and df[label].dtype == int or df[label].dtype == float: cont_names.append(label)
    else: cat_names.append(label)
return cont_names, cat_names

It is making easier to choose which columns to label as category and which as continuous. I know that sometimes people like to do it by hand but often it is just choosing like this so why not create a function for it. I was thinking that best place might be above add_datepart() in structured.py. Can I create pull request about this or is it too useless?

1 Like

Oh, good catch!
Yes unwrapping the model at the end is probably the best way to deal with this, and would get us loading and saving for free.

I think it would be a useful addition. If you make a PR, please note that the doc string should just take one line that explains what your function does (with arguments between if they are mentioned). Then edit the doc notebook tabular.transform (since I think this function should go there) and document your new function with more length (no need to list the parameters like you do) then you can show actual examples.

Done! I hope it is good enough for the library.