Recommendation: replace import * with 'import as

urmas.pitsi · July 26, 2018, 7:35am

I would like to suggest to discourage the usage of import *. Instead use import with aliases. Eg it is very common for us to write: import numpy as np.
I suggest similar style should be encouraged everywhere instead of import *.
It provides better readability of the code, avoids namespace pollution and naming conflicts etc.

stas · July 26, 2018, 10:32pm

just FYI, and not to prevent further discussion, but as a starting point, Jeremy discusses import * under ‘Other stuff’ section in: style.md.

jeremy · July 27, 2018, 1:09am

Thanks for mentioning that. The idea of using __all__ is something I definitely want to follow this time around.

The style guide doesn’t actually explain my reasoning behind import *, so maybe I’ll have a quick go at it here, and we can add some more detail to the style guide if it’s useful.

I think it should be reasonably obvious it doesn’t directly help readability to write out all your imports. It makes the code a lot longer, and the extra code doesn’t actually do anything. The only way it could be more readable is if you rely on it to know where imported symbols are coming from. So don’t do that! Use the notebook (just type the symbol name to see where it’s from), or use an IDE or editor that does popup info for you (which pretty much all of them do nowadays, although some require a bit of setup to do do).

Namespace pollution problems are something people worry a lot about, but very rarely is an issue in practice, if you follow some simple practices. If we’re using symbols that others are likely to define too, like cos and pi, then it’s good to use a namespace (like np). But for rather unique things (like itemgetter), or common things you probably want to use a default version of unless explicitly mentioned otherwise (like reduce) I don’t see the point of writing out the module name every time.

If anyone actually finds that naming conflicts are a problem they are personally having troubles with in contributing to fastai, let’s discuss those specific issues at the time. We should probably include some help in the docs on setting up editors to see symbol information, since I’ve noticed quite a few folks have complained about this issue in the past and then were rather delighted when I told them their editor could do this for them - they didn’t need to scroll to the top of the file and scan through the imports!

jsa169 · July 27, 2018, 3:48am

Namespace pollution problems are something people worry a lot about, but very rarely is an issue in practice, if you follow some simple practices.

Many things in the software world seem to follow this same pattern: There’s the problems that people worry about, then there’s what actually happens (not a whole lot of intersection there many times). Planning to be wrong and to redo, and redo again (based on actual evidence), is pretty much the way forward.

Which brings me to what I think is an interesting question- Unit and integration testing is on the roadmap for V1, and having that in place is what helps code evolve confidently over time and roll with the punches. But how in the world do you effectively test a deep learning library? I have some ideas but they’re half baked and I’m wondering if there’s a good resource on that.

I do have quite a bit of experience with automated testing -made a selenium based framework at work that’s used for all sorts of testing at work. That’s testing insurance rating, billing, pdfs, html uis, etcs- that’s completely deterministic. In those cases it’s fairly easy to know what to test, but even then you still have to make sensible decisions on what’s -practical- and that’s an art. But deep learning frameworks seem like a very different beast because of the “fuzziness” of the results, and how long it can take to run. I suspect this might be a very open area of research…

jeremy · July 27, 2018, 4:12am

Nicely said. That’s certainly how may process works - I don’t think I’m smart enough to do any other way…

Oh lord that’s a question that’s so damn hard to answer and we’re all struggling with it! We’ll definitely need a new thread for this pretty soon. I used to do TDD for everything I wrote, but nowadays with Jupyter Notebook my process is very different - much more based on interactive visual testing as I go. But tests will be required to ensure that future PRs, refactorings, etc don’t break things (which has often happened in v0).

One thing that @Sylvain and I are finding helpful at the moment is to have a very strong CIFAR-10 benchmark for both accuracy and speed. It runs in <10 mins, and gives us very strong confidence that we haven’t screwed anything up if we see our accuracy and speed maintained. And if we do something that we expect to make things faster or more accurate, and we don’t see that in our CIFAR-10 results, it indicates that we may have stuffed something up.

Obviously this doesn’t tell us where we stuffed something up - but the biggest issue by far in my experience in ML is not being aware at all that you have a problem; instead, you just get less good models, but you don’t even realize they’re not as good as they could be. A couple of examples from this recent post from Smerity:

urmas.pitsi · July 27, 2018, 5:15am

I definitely don’t suggest to import one-by-one. I propose following usage:
import fastai.dataset as DS
import fastai.model as M
import fastai.conv_learner as CL
import fastai.learner as L

learn = CL.ConvLearner.from_model_data(…)
…

It will not add much more code, but adds readability: understanding where the functions are coming from. I would argue that my suggestion only adds some benefits to current usage without any significant drawbacks.

I am fully aware that current style guide suggests * imports. Thought about it for a long time, but now as the v1 is under way, seemed appropriate to bring forward the issue. Sorry about that and thank you for thorough response.

jsa169 · July 27, 2018, 5:21am

Ok, it looks like this is definitely an open ended yet very important topic, which sounds like fun to me! Yeah the whole “less good” thing drives me nuts already as a student because it’s hard to know where to start sometimes, and the temptation is to just keep turning different knobs that you think -might- work. That turns into a rabbit hole quickly.

For example- this one I came across last year was tough when doing cats and dogs on the first iteration of Fast.AI (the pretrained model was for Theano- hence the issue):

I just came across this today and it was quite helpful to point me in the right direction after a lot of frustration! I went from ~92 to 93% validation accuracy and bumped up to 98.5% simply by changing the backend from Tensorflow to Theano. This was driving me nuts!

You mention this:

One thing that @Sylvain and I are finding helpful at the moment is to have a very strong CIFAR-10 benchmark for both accuracy and speed. It runs in <10 mins, and gives us very strong confidence that we haven’t screwed anything up if we see our accuracy and speed maintained. And if we do something that we expect to make things faster or more accurate, and we don’t see that in our CIFAR-10 results, it indicates that we may have stuffed something up.

Having a small/fast benchmark makes total sense, to test accuracy and speed. The downside of course is it takes a few minutes of testing time. But there definitely needs to be something that gives you end to end feedback while minimizing the likelihood of deviating from the real world performance too much.

We’ll definitely need a new thread for this pretty soon.

I think I’ll go ahead and start experimenting on some of my half-baked ideas to see if I can come back with something useful by the time that thread comes about. And see if others are doing anything that seems useful. Ideally I think there should be ways to do it fast/detailed.

Something to keep in mind is that in testing a great approach is to think of it in terms of cutting from many angles to expose different aspects in effective ways (it’s generally not a good idea to try to test everything with one type of test, in other words). It’s a balance of speed vs details vs scope (unit/integration).

jeremy · July 27, 2018, 5:40pm

I think this can make sense in some cases - e.g. transforms (tfm.rotate is possibly more clear than rotate, since the latter could refer to many different things).

But a lot of the modules only have one or two classes, or they’re very obvious where they’re from - so e.g. L.Learner() seems rather redundant. Even M.resnet50() seems redundant to me.

matttrent · July 27, 2018, 10:11pm

Another consideration of the import * route beyond polluting the namespace is its effect on the dependencies the fastai library insists on having installed to run.

fastai’s current import * approach requires every package used anywhere in the library to be installed in order to use it. For example, even if I’m only using the image classification portion of fastai, I need all the packages used by NLP and structured data to be installed locally, even if I’m not using them directly in any of those methods in my code. This is because of how the individually fastai submodules import each other and import conv_learner eventually causes a import sklearn_pandas to occur.

If you have to use fastai on any python distribution besides anaconda, this can be a headache to manage all the dependencies you don’t actually need. It also increases the size of any distribution needed when you’re deploying models. (I actually maintain a version of fastai with cleaned-up imports so I can use it on my work’s internal build tools and infra.)

With a more tightly-scoped set imports within fastai would let users install it and only use the portions of the library (and their dependencies) necessary for their learning tasks.

All code is a tradeoff. If we decide that fastai should be friendly to more production oriented use cases, that may motivate some stylistic tradeoffs. I just wanted to point out that there are implications beyond namespace pollution to how the library gets used in practice. While I’d love to be able to use fastai more easily in our production settings, I think the question is where does the library want to fall on that spectrum.

jeremy · July 27, 2018, 10:18pm

Ah yes - there are two different uses of import *:

Inside our module implementations
In the notebooks/examples/etc

I’ve been really talking about (2) - sorry for not making it clear. For the module implementations we’ll be more careful about what we import, for this reason. And also just more careful about the internal dependency graph in general.

Thanks for helpful comment!

matttrent · July 27, 2018, 10:29pm

Oh, agree 100% then. I use import * in notebooks all the time.

Happy to help with internal dependencies (amongst other things) when we get to that point.

shaun1 · July 27, 2018, 10:47pm

Could you share this? I would love this as its been such a headache to get just the stuff I need from FastAI.

matttrent · July 27, 2018, 10:48pm

Lemme see and get back to you. I gotta jump through some open source contribution hoops here at work first.

urmas.pitsi · July 28, 2018, 7:27am

I was also referring to use case nr.1 - internal modules. eg import * from .core etc.
With notebooks, ie how the library itself is used, I am ok with * imports.
Seems I kind of misspoke my intent

antorsae · October 19, 2018, 11:49am

Can you share the CIFAR10 benchmarking code?

(I’d like to compare FP16 vs. FP32 in both 1080 Ti and 2080 Ti)

jeremy · October 19, 2018, 12:11pm

Here you go

github.com

fastai/fastai/blob/master/examples/cifar.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "from fastai import *\n",
    "from fastai.vision import *\n",
    "from fastai.vision.models.wrn import wrn_22\n",
    "\n",
    "torch.backends.cudnn.benchmark = True"
   ]
  },
  {
   "cell_type": "code",

This file has been truncated. show original

Not sure if CIFAR10 images are big enough to see full speedup - I don’t remember. We’ve been focussing on Imagenet recently.

7f9j2rah · May 7, 2023, 9:35am

I don’t really follow the fastai package structure? Is fastai a python module? Why can’t I simply import fastai and have to import explicitly e.g. import fastai.vision.widgets?

muellerzr · May 7, 2023, 1:14pm

Nothing is stopping you from doing that, nor doing explicit imports outright. (I do this often for my teaching material and code that uses a fastai lib)

7f9j2rah · May 7, 2023, 6:38pm

Well, kind of, I can’t do import fastai or import fastai.vision I have to do import fastai.vision.widgets and import fastai.vision.learner.