Developer chat

I edited the post to reflect the state of randomness.

While looking for a friendly randomizer I found this package: https://github.com/ftobia/pytest-ordering - so if we have some modules that we want to run in a particular order, this module might be useful. I haven’t checked its quality though. But posting here in case we need it later.

Could someone explain to me this line from text_data_from_csv doc:

kwargs will be split between the TextDataset function and to the get_data function, you can precise there parameters such as max_vocab, chunksize, min_freq, n_labels (see the TextDataset documentation) or bs, bptt and pad_idx (see the sections LM data and classifier data)

Does that mean that we can pass, for instance, n_labels=60000 in the signature? I tried doing that but I got an error.

Added a new ImageDataBunch class, and moved all the factory methods in to it. See the examples and tests for updated usage.

2 Likes

I’m looking for input on how we can protect contributors from getting misleading ‘fail’ status on their PRs, when the ‘failure’ is totally unrelated to their PR contribution.

Here is an example of the most recent PR, that happened to be 0.7.x-related PR https://github.com/fastai/fastai/pull/900. It reports failure https://dev.azure.com/fastdotai/fastai/_build/results?buildId=216&view=logs because one of the non-deterministic tests has failed again. But it’s the same issue with any PRs 1.0.x or 0.7-alike.

As we discussed, those tests will fail occasionally due to their nature, and while we try to keep master as correct as possible, it’s OK if CI builds occasionally fail. But it’s not OK, IMHO, if PR builds fail when it’s not contributor’s error. Especially in cases like the one I present where the error doesn’t even have anything to do with the proposed contribution (0.7.x patch).

Thoughts?

The failure was:

>       assert abs(x.mean()) < abs(m)
E       assert tensor(0.1254) < tensor(0.1231)
E        +  where tensor(0.1254) = abs(tensor(-0.1254))
E        +    where tensor(-0.1254) = <built-in method mean of Tensor object at 0x1237f2090>()
E        +      where <built-in method mean of Tensor object at 0x1237f2090> = tensor([[[[-0.4745, -0.4745, -0.4745,  ..., -0.4745, -0.4745, -0.4745],\n          [-0.4745, -0.4745, -0.4745,  ..., -0..., -0.4745,  ..., -0.4745, -0.4745, -0.4745],\n          [-0.4745, -0.4745, -0.4745,  ..., -0.4745, -0.4745, -0.4745]]]]).mean
E        +  and   tensor(0.1231) = abs(tensor(0.1231))

tests/test_vision.py:36: AssertionError

but please let’s focus here not on trying to improve the test, since it’ll never be 100% failure-proof, but on how to make contributors’ experience better.

Thank you.

And my idea about solving this is to have a special flag or env var passed to pytest that when used, it will not fail such tests but xfail instead.

And if we are ok with occasional CI build failures, but not OK with PR build failures, we could have separate scripts for CI builds vs PR builds, except it’d be a lot of duplication. Perhaps there is a way to tell from the environment whether this is a CI or PR build, and only pass such flag in PR builds.

I’m doing a bit of a clean up on the module requirements so that they users will have all the requirements correctly installed.

I use tests/test_mod_independency.py for that testing. Basically this test should list all modules inside dev_requirements in setup.py and not fail.

It’s not the case at the moment. In particular, these 3 are required for import fastai to succeed:

nbconvert
nbformat
traitlets

so either fastai needs to be decoupled from them, or they must be moved into normal requirements.

Big general change in the API where basically all the {application}_data_from_{thing} disappeared to give birth to:

  • ImageDataBunch
  • TextDataBunch
  • TabularDataBunch

This is motivated by the fact each of those are going to have helper functions specific to their application (ImageDataBunch is gaining a lot of them but more will follow) and it also simplifies a part that was a bit obscure in NLP, the data_func=something arguments. Now we subclass TextDataBunch according to the task in TextLMDataBunch or TextClasDataBunch.

Docs and examples are all update to reflect that.

4 Likes

@stas you can remove non deterministic 0.7 tests if you like

You mean old/tests/*? I wasn’t talking about those tests, they aren’t being run.

I was talking about tests/*. The only reason 0.7 was part of my message is that I was giving an example of a fresh PR that just happened to be related to 0.7x.

I edited my original post to clarify my communication.

Just what I was looking for. A place to start if planning to start contributing to the platform.

I’ve just updated the new version of fast.ai library and had un error

I’m very appreciated if someone can help me to figure out the problem. I need fast.ai for some works now.
Thank you so much in advance

I solved the problem above :D. It comes from spacy package. As I remember, when I ran pip install -e .[dev], the spacy package wasn’t be installed because it need numpy version 1.15. Then I updated numpy and spacy with conda and everything is fine now.
`

Sorry for too many spams but I always have a problem about notebook with new fast.ai library. The notebook are always failed to open successfully

Maybe try conda update --all.

1 Like

I bumped the dependencies to require numpy version 1.15, thank you for the heads up, @dhoa.

1 Like

OK, here is an update on the situation with unstripped out notebook commits. The git devs won’t change git to enforce the configuration and it’s very unlikely this will ever change. What they recommended is to setup a CI that checks the nbs. This would have worked if we haven’t been committing directly to master, but this is not the case. So we are going for the second best solution:

Both fastai and fastai_docs now run CI watchdogs to detect unstripped nb commits and PRs. So, each developer, needs to be responsible for their own commits and simply watch the status of their commits (usually takes a few minutes to complete post push). If you see the watchdog reporting a failure, no harm done, fix your setup, re-strip the notebooks and commit them asap, because it affects everybody and not just you.

I personally still feel too new to git, so I always run git diff before any commit, so I usually detect such problems visually. Of course, you may choose any way that works for you. The only request for you is to watch your commits’ status and act accordingly. Here is a shortcut for the commit pages on each repo (I have those in bookmarks):

Perhaps it’s even easier to watch the builds page, which has them all in one place:

Perhaps it’d make sense to get the watchdog to email the developer responsible for the issue, but I don’t think we have smtp access on CI builds.

p.s. you can also alias your git commit to something like:

git-safe-commit='tools/trust-origin-git-config; git commit'

so it will always enable the filters before commit. Except git config doesn’t handle overwrites well and will continually append to .gitconfig, which over time will slow down your git operation, so a more efficient way would be to unconfigure and then reconfigure and it’d be faster without the wrapper scripts:

git config --local --unset include.path; git config --local include.path ../.gitconfig; git commit;

1 Like

and if you watch this builds status page you will see that test_vision pretty consistently fails - at least several times a day - far from 98%.

@sgugger getitem in ImageMultiDataset is defined as returning a tuple of [Image, ImgLabels] but the return value is returning a tuple of Image and one hot encoding so I’m thinking maybe the return value should be something like Tuple[Image, Collection[Float]] or something??

def getitem(self,i:int)->Tuple[Image, ImgLabels]: return open_image(self.x[i]), self.encode(self.y[i])

Ah good to know. That one’s an easy fix. Done now. (It was a test that can be deterministic without any problems).

1 Like

I just pushed a new release (1.0.6) to pypi (pip) and anaconda (conda). Thanks to @stas for making it more painless than I’ve ever seen before :slight_smile: Main difference in this release is the use of ImageDataBunch factory methods instead of the functions we had before. (Ditto for each application: text, colab, tabular.) Also NaN loss bug worked around.

2 Likes

From now on, if you fix a significant bug that existed in the last released version, or add a feature, or change the documented behaviour of something, please update CHANGES.md immediately when you push.

When we release a new version, we’ll simply update the version number in CHANGES.md and create a new section above it.