Recommendation: replace import * with 'import as


(urmas pitsi) #1

I would like to suggest to discourage the usage of import *. Instead use import with aliases. Eg it is very common for us to write: import numpy as np.
I suggest similar style should be encouraged everywhere instead of import *.
It provides better readability of the code, avoids namespace pollution and naming conflicts etc.


Coding Style for v1
(Stas Bekman) #2

just FYI, and not to prevent further discussion, but as a starting point, Jeremy discusses import * under ‘Other stuff’ section in: style.md.


(Jeremy Howard) #3

Thanks for mentioning that. The idea of using __all__ is something I definitely want to follow this time around.

The style guide doesn’t actually explain my reasoning behind import *, so maybe I’ll have a quick go at it here, and we can add some more detail to the style guide if it’s useful.

I think it should be reasonably obvious it doesn’t directly help readability to write out all your imports. It makes the code a lot longer, and the extra code doesn’t actually do anything. The only way it could be more readable is if you rely on it to know where imported symbols are coming from. So don’t do that! Use the notebook (just type the symbol name to see where it’s from), or use an IDE or editor that does popup info for you (which pretty much all of them do nowadays, although some require a bit of setup to do do).

Namespace pollution problems are something people worry a lot about, but very rarely is an issue in practice, if you follow some simple practices. If we’re using symbols that others are likely to define too, like cos and pi, then it’s good to use a namespace (like np). But for rather unique things (like itemgetter), or common things you probably want to use a default version of unless explicitly mentioned otherwise (like reduce) I don’t see the point of writing out the module name every time.

If anyone actually finds that naming conflicts are a problem they are personally having troubles with in contributing to fastai, let’s discuss those specific issues at the time. We should probably include some help in the docs on setting up editors to see symbol information, since I’ve noticed quite a few folks have complained about this issue in the past and then were rather delighted when I told them their editor could do this for them - they didn’t need to scroll to the top of the file and scan through the imports! :slight_smile:


(Jason Antic) #4

Namespace pollution problems are something people worry a lot about, but very rarely is an issue in practice, if you follow some simple practices.

Many things in the software world seem to follow this same pattern: There’s the problems that people worry about, then there’s what actually happens (not a whole lot of intersection there many times). Planning to be wrong and to redo, and redo again (based on actual evidence), is pretty much the way forward.

Which brings me to what I think is an interesting question- Unit and integration testing is on the roadmap for V1, and having that in place is what helps code evolve confidently over time and roll with the punches. But how in the world do you effectively test a deep learning library? I have some ideas but they’re half baked and I’m wondering if there’s a good resource on that.

I do have quite a bit of experience with automated testing -made a selenium based framework at work that’s used for all sorts of testing at work. That’s testing insurance rating, billing, pdfs, html uis, etcs- that’s completely deterministic. In those cases it’s fairly easy to know what to test, but even then you still have to make sensible decisions on what’s -practical- and that’s an art. But deep learning frameworks seem like a very different beast because of the “fuzziness” of the results, and how long it can take to run. I suspect this might be a very open area of research…


(Jeremy Howard) #5

Nicely said. That’s certainly how may process works - I don’t think I’m smart enough to do any other way…

Oh lord that’s a question that’s so damn hard to answer and we’re all struggling with it! :frowning: We’ll definitely need a new thread for this pretty soon. I used to do TDD for everything I wrote, but nowadays with Jupyter Notebook my process is very different - much more based on interactive visual testing as I go. But tests will be required to ensure that future PRs, refactorings, etc don’t break things (which has often happened in v0).

One thing that @Sylvain and I are finding helpful at the moment is to have a very strong CIFAR-10 benchmark for both accuracy and speed. It runs in <10 mins, and gives us very strong confidence that we haven’t screwed anything up if we see our accuracy and speed maintained. And if we do something that we expect to make things faster or more accurate, and we don’t see that in our CIFAR-10 results, it indicates that we may have stuffed something up.

Obviously this doesn’t tell us where we stuffed something up - but the biggest issue by far in my experience in ML is not being aware at all that you have a problem; instead, you just get less good models, but you don’t even realize they’re not as good as they could be. A couple of examples from this recent post from Smerity:


(urmas pitsi) #6

I definitely don’t suggest to import one-by-one. I propose following usage:
import fastai.dataset as DS
import fastai.model as M
import fastai.conv_learner as CL
import fastai.learner as L

learn = CL.ConvLearner.from_model_data(…)

It will not add much more code, but adds readability: understanding where the functions are coming from. I would argue that my suggestion only adds some benefits to current usage without any significant drawbacks.

I am fully aware that current style guide suggests * imports. Thought about it for a long time, but now as the v1 is under way, seemed appropriate to bring forward the issue. Sorry about that and thank you for thorough response.


(Jason Antic) #7

Ok, it looks like this is definitely an open ended yet very important topic, which sounds like fun to me! Yeah the whole “less good” thing drives me nuts already as a student because it’s hard to know where to start sometimes, and the temptation is to just keep turning different knobs that you think -might- work. That turns into a rabbit hole quickly.

For example- this one I came across last year was tough when doing cats and dogs on the first iteration of Fast.AI (the pretrained model was for Theano- hence the issue):

I just came across this today and it was quite helpful to point me in the right direction after a lot of frustration! I went from ~92 to 93% validation accuracy and bumped up to 98.5% simply by changing the backend from Tensorflow to Theano. This was driving me nuts!

You mention this:

One thing that @Sylvain and I are finding helpful at the moment is to have a very strong CIFAR-10 benchmark for both accuracy and speed. It runs in <10 mins, and gives us very strong confidence that we haven’t screwed anything up if we see our accuracy and speed maintained. And if we do something that we expect to make things faster or more accurate, and we don’t see that in our CIFAR-10 results, it indicates that we may have stuffed something up.

Having a small/fast benchmark makes total sense, to test accuracy and speed. The downside of course is it takes a few minutes of testing time. But there definitely needs to be something that gives you end to end feedback while minimizing the likelihood of deviating from the real world performance too much.

We’ll definitely need a new thread for this pretty soon.

I think I’ll go ahead and start experimenting on some of my half-baked ideas to see if I can come back with something useful by the time that thread comes about. And see if others are doing anything that seems useful. Ideally I think there should be ways to do it fast/detailed.

Something to keep in mind is that in testing a great approach is to think of it in terms of cutting from many angles to expose different aspects in effective ways (it’s generally not a good idea to try to test everything with one type of test, in other words). It’s a balance of speed vs details vs scope (unit/integration).


(Jeremy Howard) #8

I think this can make sense in some cases - e.g. transforms (tfm.rotate is possibly more clear than rotate, since the latter could refer to many different things).

But a lot of the modules only have one or two classes, or they’re very obvious where they’re from - so e.g. L.Learner() seems rather redundant. Even M.resnet50() seems redundant to me.


(Matt Trent) #9

Another consideration of the import * route beyond polluting the namespace is its effect on the dependencies the fastai library insists on having installed to run.

fastai’s current import * approach requires every package used anywhere in the library to be installed in order to use it. For example, even if I’m only using the image classification portion of fastai, I need all the packages used by NLP and structured data to be installed locally, even if I’m not using them directly in any of those methods in my code. This is because of how the individually fastai submodules import each other and import conv_learner eventually causes a import sklearn_pandas to occur.

If you have to use fastai on any python distribution besides anaconda, this can be a headache to manage all the dependencies you don’t actually need. It also increases the size of any distribution needed when you’re deploying models. (I actually maintain a version of fastai with cleaned-up imports so I can use it on my work’s internal build tools and infra.)

With a more tightly-scoped set imports within fastai would let users install it and only use the portions of the library (and their dependencies) necessary for their learning tasks.

All code is a tradeoff. If we decide that fastai should be friendly to more production oriented use cases, that may motivate some stylistic tradeoffs. I just wanted to point out that there are implications beyond namespace pollution to how the library gets used in practice. While I’d love to be able to use fastai more easily in our production settings, I think the question is where does the library want to fall on that spectrum.


(Jeremy Howard) #10

Ah yes - there are two different uses of import *:

  1. Inside our module implementations
  2. In the notebooks/examples/etc

I’ve been really talking about (2) - sorry for not making it clear. For the module implementations we’ll be more careful about what we import, for this reason. And also just more careful about the internal dependency graph in general.

Thanks for helpful comment! :slight_smile:


(Matt Trent) #11

Oh, agree 100% then. I use import * in notebooks all the time.

Happy to help with internal dependencies (amongst other things) when we get to that point.


#12

Could you share this? I would love this as its been such a headache to get just the stuff I need from FastAI.


(Matt Trent) #13

Lemme see and get back to you. I gotta jump through some open source contribution hoops here at work first.


(urmas pitsi) #14

I was also referring to use case nr.1 - internal modules. eg import * from .core etc.
With notebooks, ie how the library itself is used, I am ok with * imports.
Seems I kind of misspoke my intent