Fastai style guide now available

I’ve jotted down a draft of a fastai style guide. I’ve endeavored to make it somewhat educational and interesting, not just a bland list of requirements, so hopefully you like it! :slight_smile:

I’d love to hear any feedback you have about the way the fastai style is described in this guide. I’m not really looking for feedback on the style choices I made (I’m not going to be changing them now, unless there’s some very compelling research that pops up that clearly shows I’ve made a significant error based on my project goals); I’m looking for feedback on how I’ve written the guide, eg:

  • Have I forgotten to document something?
  • Is any of the explanation unclear?
  • Are there places that don’t currently have a code example that need one?
  • Are there other resources I could link to that describe some of the concepts well?

Also, if you’re aware of any good research papers looking into the impacts of different coding styles, I’d love to hear about it.

Please don’t link to this on social media just yet - I’d like to iterate a bit based on your feedback first…

6 Likes

Hi Jeremy,

I read through the style guide and it all made sense to me and seemed detailed enough for someone to get started with contributing in a productive manner. The only thing that stands out is not having a recommended approach to documentation or at the very least a recommendation on how one should structure a 1-2 line descriptive docstrings. I’ve personally found the Numpy style (https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt) to work fairly well for me. It’s also nice that some IDEs (like Spyder) will auto-format documentation that follows along those guidelines. That being said, any guideline will do as long as it allows for consistency in documentation. From my own experience, the best time to scaffold out documentation is when first implementing something in code or you end up with lots of code with little documentation at which point there is even less motivation to revisit the code for the sole purpose of adding in documentation :slight_smile:. So it might be good to iron that out sooner rather than later or the FastAI lib might end up with a mix of different documentation styles.

And I know that you mentioned not wanting feedback on some of the choices made, but I was hoping that you might consider replacing import * from X statements with aliased imports like import * from X as Y. Imho, it makes it a lot easier to follow code and understand where different methods/classes/constants live in the library and makes it easier to form a mental map of the library while reading someone’s code or browsing through the library. I’ve struggled with this a bit while trying to read through some of the fastai code on Github.

This looks great though and I hope this results in a vibrant and active dev community around fastai!

PS - I enjoyed the intro with background, history and references. Except now I have a bunch more things added to my ever-growing reading list, making it all the more difficult to follow your advice from lesson 1 regarding spending more time doing/coding and less time worrying about reading :smile:

I would say that probably no more then 100-120 characters per line also could be suitable. It makes more simple to dig into code having 2-3 panes (depending on monitor resolution) opened side by side simultaneously without horizontal scrolling.

Also, from something import * makes a bit difficult to understand where classes and functions used in the lecture notebooks comes from. (Or when working with source files).

And, I would say that sometimes variable names are kind of too short. Like sz, bs, tmsf, etc. Not a problem for someone who is already involved (or used to deal with MATLAB naming conventions), but makes code more obscure for someone who sees it at first time.

@HamsterHuey Agree about NumPy/scikit-learn style, though using 3.6 type annotations could decrease number of type annotations in docstrings.

Another quick note regarding this excerpt from the style guide:

My ideal would be to have a decorator with a single line of documentation that links to a more detailed markdown doc.

I am curious about how you envision this working? I know D3.js does something like this: https://github.com/d3/d3/blob/master/API.md

The docs/API is entirely decoupled from the library and lives as a markdown file in the repo. However, I envision a couple of issues with decoupled documentation:

  1. There is more friction to maintaining documentation and making sure docs are updated after code changes when the docs are stored in a separate location. I’ve seen this happen with some projects, especially ones with a lot of user contributions and eventually the projects ended up migrating to maintaining docs within the code.

  2. Decoupling documentation from the code will make it more difficult for users of the FastAI library to have easy access to documentation in their favorite IDEs. Given the focus on Jupyter notebook for the courses, it would be a shame for none of the FastAI documentation to actually work in Jupyter if you went with the option of a separate set of documentation.

To address point 2 above, one could have a short docstring + input parameters and return values explained with type annotations and leave the details in the separate markdown file (or alternatively some other documentation system like Sphinx + Readthedocs, etc.). That still wouldn’t address the issue of problem 1, but I guess all PRs will need to ensure appropriate documentation in the required places before being considered for merging into the main branch.

Yeah I’m not quite sure, but roughly like the Python standard library is done. They have short docstrings, but descriptive complete web pages as well. The one missing piece in Python is a way to jump from the docstring to the web docs (AFAIK).

Have you tried using the Sourcegraph chrome extension, or (better still) jumping to references and viewing popup info using vscode (or similar)? That is a far faster a more flexible way to handle this, and doesn’t require any extra work for the library author.

Yup, Vscode would work well for that. My primary issue is in the following 2 scenarios:

  1. Working through an Jupyter notebook for a lecture where a lot of imports follow the same notation and it makes it hard to know easily where specific functions came from (though I may not be aware of some trick on Jupyter Lab that would help). This is not an issue with the library, but the imports in the notebooks which still result in the same issue.

  2. Often times, it’s a lot easier for me to just search the Github FastAI repo for a specific function on the web to look at it. When I’m there, I’m usually very lost since almost all the python files use import * from X notation with imports from several modules.

When developing code for FastAI, then having an IDE with the FastAI lib opened in it is very likely. In that situation, as you mentioned, there should be minimal issues. However, if you are just prototyping code in Jupyter Lab (as a lot of us are doing when working through lectures) and trying to look up FastAI specific code, you are most likely going to end up browsing the codebase on Github and then point 2 ends up being a bit of an issue.

Just type the symbol name in a cell and hit shift-enter to see where it’s from.

The sourcegraph chrome extension is exactly what you need :slight_smile: It makes all symbols clickable, showing you where they are defined, and letting you jump to their definition.

Awesome! :slight_smile: Thanks, I’ll be sure to check it out and use it. I’d love to get my hands dirty by digging through the FastAI code and familiarizing myself with PyTorch as I progress through the courses.

Yeah I’m not quite sure, but roughly like the Python standard library is done. They have short docstrings, but descriptive complete web pages as well. The one missing piece in Python is a way to jump from the docstring to the web docs (AFAIK).

I believe Sphinx lets you combine docstring docs with custom docs (http://www.sphinx-doc.org/en/master/ext/autodoc.html) for generating project documentation. Not entirely sure about a way to jump from the docstring to webdocs with a link but Sphinx’s cross-referencing syntax might very well work

I’m no expert on Sphinx, but given the power and flexibility it affords, I’d be surprised if someone more well-versed in it wouldn’t be able to sort out how exactly to structure documentation generation to be along the lines you had in mind. :slight_smile:

Yeah I know there’s a few students that have been playing with Sphinx. Nothing has come out of that which really looks great yet, but I’m sure we’ll get there…

The thing I’m happiest with on docs so far is the use of decorators to add docs to the pretrained models in fastai.torch_imports.

Also, a reference to bunch of automated documentation tools here.

I would say it is a good idea to have verbose docstrings, like scikit-learn or keras developers do. You can figure out what is going on when you’re calling method right from IDE or terminal. And, keep documentation close to the codebase making docs updates more simple and (probably) aligned with code updates.

@jeremy Are you still open to students experimenting with Sphinx for documentation improvement? There are quite a few people interested in the project, but since it would involve much docstring editing to the source code, we really need a node and more timely updates on the development status to move on to it, so that we don’t conflict with other contributors who are also maintaining the docs.