Fastai style guide now available - feedback welcome!

jeremy · March 27, 2018, 6:26am

That’s a good point. If you or anyone wants to think about what the real dependencies are of text, vision, etc, we could create a few different import files. It would still allow us to save time by importing just a small number of import files that contain the deps, but also avoid unnecessary dependencies.

jeremy · March 27, 2018, 6:26am

I think @dang.hien’s suggestion would probably mainly handle this issue. Not sure alphabetical order would help much.

yggg · March 27, 2018, 7:05am

When I dive into the fastai codebase, I often need to look up the definition of some functions, say to_gpu function in nlp.py file.

Assume this is my first few times diving into fastai library, I don’t have much experience about where each function is coming from which module, so I scroll to the top of the file to look for which module that to_gpu is imported from, however I’d usually see a list of from module_XYZ import *. For instance, in nlp.py, we have this list:

from .imports import *
from .torch_imports import *
from .core import *
from .model import *
from .dataset import *
from .learner import *
from .text import *
from .lm_rnn import *

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from torchtext.datasets import language_modeling

At this point, I can either iterate each of these modules to look for def to_gpu, or I use some commandline-fu like grep/ack/git grep to search for 'def to_gpu'.

$ git grep 'def to_gpu'
fastai/core.py:44:def to_gpu(x, *args, **kwargs):

Finally, I figured out to_gpu is coming from core.py file at line 44. But that’s because I have cloned fastai.git from github, and I have a terminal opened right beside me. For those who’s browsing on github.com (possibly first timer stumbled on this awesome codebase), they’d probably have no way to quickly jump to the definition of to_gpu.

I understand that @jeremy has mentioned from part1 of his first few lectures that import * is the style that he prefers for data science partitioners, and I totally agree with it. That is, when I’m doing research (the user of fastai), I open a notebook and I have in my mind which libraries/modules that I need for the particular problem/dataset, my objective is to solve the problem ASAP, so I care less about the namespace and I’d import everything.

IMHO, when building a library such as fastai, our roles switch from user to developer/maintainer, and I would suggest to be a little bit more explicit, i.e. from .core import to_gpu style.

My 2 pennies cheers

dang.hien · March 27, 2018, 7:39am

Actually, you don’t have to do this. In the first lecture (last week), Jeremy demonstrated how you can look for definition or usage of a symbol in Visual Code. Other IDEs have similar features. If you are using Vim (like I do), you should check the jedi-vim package. It can do a lot of things, but the most useful commands are \d to go to the definition of the symbol under the cursor, and Ctrl-o to go back.

dang.hien · March 27, 2018, 8:01am

That what I was thinking. I will try to come up with something soon.

In the style guide, you also suggest that

If you find the abbreviations in a module non-obvious, feel free to add a list of them to the module’s markdown file in this docs folder (create one if needed)

However, as far as I can see, there are some common abbreviations that have clear meaning in the ML context (like sz, c, n or bs), and some need a specific context to be meaningful (likebptt). Should we put a table of these generic abbreviations into the style guide or a separate markdown, and drop module-dependent abbreviations to its module markdown?

radek · March 27, 2018, 11:35am

I fully understand your concerns and I have been there before! This is something though that technology can solve for us easily.

There are browser extensions you can use for navigating around github, though I have not had much luck with them.

For browsing code locally, @dang.hien has a good suggestion. Exuberant ctags is something that received a bit of discussion during part 1 and is the approach that I use with VIM (though you can use the tags with other editors that support them as well). I documented how they can be used with VIM here .

BTW this ‘automated approach’ is quite nice once you get used to it. It feels so convenient the idea of mentally deciphering where stuff gets imported from and then navigating to that file and searching for the definition inside the file seems like a horrible hassle to me now All I have to do is :tag ImageC and I am where I wanted to be

yggg · March 27, 2018, 2:31pm

I use all kinds of plugins to navigate codebases as well, and I suppose everyone has their preferred set of plugins.

I just want to differentiate between the roles of user and developer/maintainer of a library - as a user, I usually use my tools in the fastest/dirtiest way, so I encourage the usage of from awesome_module import *; on the other hand, as a developer, I try to always consider the person who will read my code (possibly myself in a few months), especially those who wasn’t there until the code has evolved several versions, and be explicit and (somewhat) verbose.

radek · March 27, 2018, 3:03pm

Agree - I think there are more ways to approach thinking about the developer using the tools and I have certainly benefited from being exposed to other ways of thinking about this. When starting with the fastai library I also was not super convinced to the import * but now this has changed

Sorry if my words came across as criticism of your approach - just wanted to suggest a couple of tools you might be interested in checking out in case you were not familiar with them. As you are, that is cool

yggg · March 27, 2018, 3:54pm

Not at all. Just wanted to share my point of view and make constructive discussions, cheers

wgpubs · March 27, 2018, 5:34pm

Two things:

I hate PEP8 (so thanks for not using it)
What about naming conventions for nested loops? For example:

for k, v in my_dict.items():
    d = v # v is another dictionary
    for k1, v2 in d.items():
        print(v2)

Thanks - wg

Moody · March 27, 2018, 5:36pm

I am not sure when to use capital letters for naming convention. My observations are:

use capital letters to assign PATH and file column names (for auto-completion)
use capital proper for fastai function (eg. RandomFlip)
otherwise, use small letters. But, there is an exception (ie .COORD)

I like the way you set the culture of fastai.

jeremy · March 27, 2018, 6:46pm

I think a separate markdown is best, since these are helpful even for people that aren’t contributing to the library.

jeremy · March 27, 2018, 6:53pm

The CamelCase one is mentioned in the guide already. ALL_CAPS isn’t mentioned - thanks for the reminder. I don’t think we need to document how that’s used in notebooks, since it’s unrelated to contributing to the library. It’s also used in enums, as you mentioned, although I’m not sure why. I might switch those to lower_case.

blakewest · March 28, 2018, 1:01am

Regarding linters… While I totally get that there are no linters that will fit your exact style guide, there are plenty of things that linters can do which I think are both helpful and nearly impossible to disagree with. For instance, you specifically call out “no trailing whitespace” as something you’d like to see. That can be caught with a linter. There are others like "no unused variables in a function (with the exception of ‘_’) ". I’ve yet to come across a case where it’s truly required to have an unused variable. It also really helps while your developing, to make sure you catch every place when you change a variable name. Same for unused functions.
My point is only that one does not need to follow PEP8 to get a lot of value from linters, and it could be worth considering how they can help the project.

jeremy · March 28, 2018, 3:39am

Fair point. I admit my vim is set up to highlight trailling whitespace in red

Although I’d still avoid any auto-linters. E.g. I often have unused variables temporarily, since I may have commented out some code that is the only place it’s used, while I’m experimenting.

radek · March 29, 2018, 3:08pm

At some point I would like to delve into APL

@jeremy could I please ask you what would you see as the best entry point into this family of languages? There seem to have been quite a few different versions of them created and I am not sure if one is a strict successor of the ones that came before - they seem to vary in functionality and use cases as well.

And even if it would be the case that the newest one would be the best to start with, could I please ask you which one would that be?

Many thanks!

jeremy · March 29, 2018, 3:21pm

Definitely J is the one to look at. www.jsoftware.com . Even runs on an ipad!

suvash · March 29, 2018, 9:04pm

and this is a good intro to J lang, if you’re into watching videos. I remember seeing a version of this talk sometime ago and being completely blown away.https://www.youtube.com/watch?v=gLULrFY2-fI

radek · April 13, 2018, 9:36pm

I have studied lecture 9 quite a bit and there are some patterns there I have not seen before. For instance, like we do for the ground truth boxes, where we perform calculations on tensors and use the outputs combined with nonzero to index into an array. Or where we calculate the jaccard index between arrays of boxes using broadcasting where the input arrays can be of arbitrary size.

What is the best way to start with J? Is there some canonical book about thinking in such a fashion that would be a good starting point? I think the answer is to study the Turing award paper and learn and read J code, but I wonder if there might be some resource that I am missing.

I watched the talk and it has been very inspiring. Specifically the part where he mentions organizing code in a fashion where you can hold more of it in your mind. The observation about how speech is essentially linear was also very interesting. And the examples that he gives of what you can do in the language have been amazing

Anyhow, might be I will never get around to learning J as it might get crowded out by other things but not sure about that at this point. If anyone would have any insights or materials to share on how to learn this that would be greatly appreciated.

Seems this online book on the J language itself might be a great place to start but maybe there are also some other resources worth taking a look at.

jeremy · April 14, 2018, 4:20am

Rather than learning J, just try practicing using broadcasting as effectively as you can, and practice using advanced indexing in numpy. That’ll get you a long way there. See how much you can do without loops!