Recommendation: replace import * with 'import as

I would like to suggest to discourage the usage of import *. Instead use import with aliases. Eg it is very common for us to write: import numpy as np.
I suggest similar style should be encouraged everywhere instead of import *.
It provides better readability of the code, avoids namespace pollution and naming conflicts etc.

4 Likes

just FYI, and not to prevent further discussion, but as a starting point, Jeremy discusses import * under ā€˜Other stuffā€™ section in: style.md.

1 Like

Thanks for mentioning that. The idea of using __all__ is something I definitely want to follow this time around.

The style guide doesnā€™t actually explain my reasoning behind import *, so maybe Iā€™ll have a quick go at it here, and we can add some more detail to the style guide if itā€™s useful.

I think it should be reasonably obvious it doesnā€™t directly help readability to write out all your imports. It makes the code a lot longer, and the extra code doesnā€™t actually do anything. The only way it could be more readable is if you rely on it to know where imported symbols are coming from. So donā€™t do that! Use the notebook (just type the symbol name to see where itā€™s from), or use an IDE or editor that does popup info for you (which pretty much all of them do nowadays, although some require a bit of setup to do do).

Namespace pollution problems are something people worry a lot about, but very rarely is an issue in practice, if you follow some simple practices. If weā€™re using symbols that others are likely to define too, like cos and pi, then itā€™s good to use a namespace (like np). But for rather unique things (like itemgetter), or common things you probably want to use a default version of unless explicitly mentioned otherwise (like reduce) I donā€™t see the point of writing out the module name every time.

If anyone actually finds that naming conflicts are a problem they are personally having troubles with in contributing to fastai, letā€™s discuss those specific issues at the time. We should probably include some help in the docs on setting up editors to see symbol information, since Iā€™ve noticed quite a few folks have complained about this issue in the past and then were rather delighted when I told them their editor could do this for them - they didnā€™t need to scroll to the top of the file and scan through the imports! :slight_smile:

2 Likes

Namespace pollution problems are something people worry a lot about, but very rarely is an issue in practice, if you follow some simple practices.

Many things in the software world seem to follow this same pattern: Thereā€™s the problems that people worry about, then thereā€™s what actually happens (not a whole lot of intersection there many times). Planning to be wrong and to redo, and redo again (based on actual evidence), is pretty much the way forward.

Which brings me to what I think is an interesting question- Unit and integration testing is on the roadmap for V1, and having that in place is what helps code evolve confidently over time and roll with the punches. But how in the world do you effectively test a deep learning library? I have some ideas but theyā€™re half baked and Iā€™m wondering if thereā€™s a good resource on that.

I do have quite a bit of experience with automated testing -made a selenium based framework at work thatā€™s used for all sorts of testing at work. Thatā€™s testing insurance rating, billing, pdfs, html uis, etcs- thatā€™s completely deterministic. In those cases itā€™s fairly easy to know what to test, but even then you still have to make sensible decisions on whatā€™s -practical- and thatā€™s an art. But deep learning frameworks seem like a very different beast because of the ā€œfuzzinessā€ of the results, and how long it can take to run. I suspect this might be a very open area of researchā€¦

Nicely said. Thatā€™s certainly how may process works - I donā€™t think Iā€™m smart enough to do any other wayā€¦

Oh lord thatā€™s a question thatā€™s so damn hard to answer and weā€™re all struggling with it! :frowning: Weā€™ll definitely need a new thread for this pretty soon. I used to do TDD for everything I wrote, but nowadays with Jupyter Notebook my process is very different - much more based on interactive visual testing as I go. But tests will be required to ensure that future PRs, refactorings, etc donā€™t break things (which has often happened in v0).

One thing that @Sylvain and I are finding helpful at the moment is to have a very strong CIFAR-10 benchmark for both accuracy and speed. It runs in <10 mins, and gives us very strong confidence that we havenā€™t screwed anything up if we see our accuracy and speed maintained. And if we do something that we expect to make things faster or more accurate, and we donā€™t see that in our CIFAR-10 results, it indicates that we may have stuffed something up.

Obviously this doesnā€™t tell us where we stuffed something up - but the biggest issue by far in my experience in ML is not being aware at all that you have a problem; instead, you just get less good models, but you donā€™t even realize theyā€™re not as good as they could be. A couple of examples from this recent post from Smerity:

1 Like

I definitely donā€™t suggest to import one-by-one. I propose following usage:
import fastai.dataset as DS
import fastai.model as M
import fastai.conv_learner as CL
import fastai.learner as L

learn = CL.ConvLearner.from_model_data(ā€¦)
ā€¦

It will not add much more code, but adds readability: understanding where the functions are coming from. I would argue that my suggestion only adds some benefits to current usage without any significant drawbacks.

I am fully aware that current style guide suggests * imports. Thought about it for a long time, but now as the v1 is under way, seemed appropriate to bring forward the issue. Sorry about that and thank you for thorough response.

2 Likes

Ok, it looks like this is definitely an open ended yet very important topic, which sounds like fun to me! Yeah the whole ā€œless goodā€ thing drives me nuts already as a student because itā€™s hard to know where to start sometimes, and the temptation is to just keep turning different knobs that you think -might- work. That turns into a rabbit hole quickly.

For example- this one I came across last year was tough when doing cats and dogs on the first iteration of Fast.AI (the pretrained model was for Theano- hence the issue):

I just came across this today and it was quite helpful to point me in the right direction after a lot of frustration! I went from ~92 to 93% validation accuracy and bumped up to 98.5% simply by changing the backend from Tensorflow to Theano. This was driving me nuts!

You mention this:

One thing that @Sylvain and I are finding helpful at the moment is to have a very strong CIFAR-10 benchmark for both accuracy and speed. It runs in <10 mins, and gives us very strong confidence that we havenā€™t screwed anything up if we see our accuracy and speed maintained. And if we do something that we expect to make things faster or more accurate, and we donā€™t see that in our CIFAR-10 results, it indicates that we may have stuffed something up.

Having a small/fast benchmark makes total sense, to test accuracy and speed. The downside of course is it takes a few minutes of testing time. But there definitely needs to be something that gives you end to end feedback while minimizing the likelihood of deviating from the real world performance too much.

Weā€™ll definitely need a new thread for this pretty soon.

I think Iā€™ll go ahead and start experimenting on some of my half-baked ideas to see if I can come back with something useful by the time that thread comes about. And see if others are doing anything that seems useful. Ideally I think there should be ways to do it fast/detailed.

Something to keep in mind is that in testing a great approach is to think of it in terms of cutting from many angles to expose different aspects in effective ways (itā€™s generally not a good idea to try to test everything with one type of test, in other words). Itā€™s a balance of speed vs details vs scope (unit/integration).

I think this can make sense in some cases - e.g. transforms (tfm.rotate is possibly more clear than rotate, since the latter could refer to many different things).

But a lot of the modules only have one or two classes, or theyā€™re very obvious where theyā€™re from - so e.g. L.Learner() seems rather redundant. Even M.resnet50() seems redundant to me.

Another consideration of the import * route beyond polluting the namespace is its effect on the dependencies the fastai library insists on having installed to run.

fastaiā€™s current import * approach requires every package used anywhere in the library to be installed in order to use it. For example, even if Iā€™m only using the image classification portion of fastai, I need all the packages used by NLP and structured data to be installed locally, even if Iā€™m not using them directly in any of those methods in my code. This is because of how the individually fastai submodules import each other and import conv_learner eventually causes a import sklearn_pandas to occur.

If you have to use fastai on any python distribution besides anaconda, this can be a headache to manage all the dependencies you donā€™t actually need. It also increases the size of any distribution needed when youā€™re deploying models. (I actually maintain a version of fastai with cleaned-up imports so I can use it on my workā€™s internal build tools and infra.)

With a more tightly-scoped set imports within fastai would let users install it and only use the portions of the library (and their dependencies) necessary for their learning tasks.

All code is a tradeoff. If we decide that fastai should be friendly to more production oriented use cases, that may motivate some stylistic tradeoffs. I just wanted to point out that there are implications beyond namespace pollution to how the library gets used in practice. While Iā€™d love to be able to use fastai more easily in our production settings, I think the question is where does the library want to fall on that spectrum.

4 Likes

Ah yes - there are two different uses of import *:

  1. Inside our module implementations
  2. In the notebooks/examples/etc

Iā€™ve been really talking about (2) - sorry for not making it clear. For the module implementations weā€™ll be more careful about what we import, for this reason. And also just more careful about the internal dependency graph in general.

Thanks for helpful comment! :slight_smile:

4 Likes

Oh, agree 100% then. I use import * in notebooks all the time.

Happy to help with internal dependencies (amongst other things) when we get to that point.

Could you share this? I would love this as its been such a headache to get just the stuff I need from FastAI.

Lemme see and get back to you. I gotta jump through some open source contribution hoops here at work first.

I was also referring to use case nr.1 - internal modules. eg import * from .core etc.
With notebooks, ie how the library itself is used, I am ok with * imports.
Seems I kind of misspoke my intent

Can you share the CIFAR10 benchmarking code?

(Iā€™d like to compare FP16 vs. FP32 in both 1080 Ti and 2080 Ti)

Here you go

Not sure if CIFAR10 images are big enough to see full speedup - I donā€™t remember. Weā€™ve been focussing on Imagenet recently.

I donā€™t really follow the fastai package structure? Is fastai a python module? Why canā€™t I simply import fastai and have to import explicitly e.g. import fastai.vision.widgets?

Nothing is stopping you from doing that, nor doing explicit imports outright. (I do this often for my teaching material and code that uses a fastai lib)

Well, kind of, I canā€™t do import fastai or import fastai.vision I have to do import fastai.vision.widgets and import fastai.vision.learner.