Documentation improvements

sgugger · February 8, 2019, 2:45pm

It’s true that those methods are defined later, but they are monkey-patched to Learner, and we want the user to find all the methods that Learner has in the same doc page (it would be rather counter-intuitive otherwise). That’s why there is this discrepancy.

stas · February 8, 2019, 3:22pm

That looks correct, I’m not sure why the currently commited html page has ‘**’ in it in first place - those should be <b></b>

PegasusWithoutWinds · February 8, 2019, 3:35pm

Sorry if this question has been answered before, but could you please explain a bit about why those methods are monkey-patched in another file, instead of defined in the same file?

stas · February 9, 2019, 6:02pm

@ashaw, I found a a small issue with the doc generator: TOC is always missing the first entry of the document because the first header of an actual section is marked as h1 and not h2, full details follow:

Let’s compare 2 files:
a. generated from .md: https://docs.fast.ai/gen_doc_main.html
b. generated from .ipynb: https://docs.fast.ai/gen_doc.gen_notebooks.html

a. the first entry is in the TOC and has h1 and h2:

<h1 class=“post-title-main”>Doc Maintenance</h1>
…
<h2 id=“process-for-contributing-to-the-docs”>Process for contributing to the docs</h2>

b. has h1 twice,
<h1 class=“post-title-main”>gen_doc.gen_notebooks</h1>
<h1 id=“Notebook-generation”>Notebook generation<a class=“anchor-link” href="#Notebook-generation">¶</a></h1>

that second h1, should be h2 - that’s why the first entry is always missing from TOC.

This was just an example - it affects all .ipynb files.

Thank you.

stas · February 10, 2019, 5:53am

OK, it’s rather simple - it’s the notebooks that all start with # foo, which forces a 2nd <h1>, instead of ## foo.

So, I think the generator has nothing to do with it other than creating the initial notebook that way.

I changed one notebook to start with ## foo and it’s just fine. So perhaps the solution is to fix all notebooks?

And perhaps the generator should create the initial notebook (when a new one is created) that starts with ## foo?

So I did that for all .ipynb besides index.ipynb, which doesn’t have its own h1, and now we have the first missing TOC entry back on all pages and it’s linked too, e.g. https://docs.fast.ai/vision.data.html#Computer-vision-data

I did:

perl -0777 -pi -e 's|# |## |ms' docs_src/*.ipynb
tools/build-docs -l

So the only remaining thing is to tweak the doc-generator so that when it creates the initial ipynb, it will start with ## for the first header instead of #.

ashaw · February 11, 2019, 1:06am

Ah thanks for looking into this and fixing it! New docs should have the updated format now - https://github.com/fastai/fastai/pull/1612

stas · February 11, 2019, 7:39pm

could someone please figure out how we could link to the groups in the menu of https://docs.fast.ai/ e.g. how do I link to Tutorials (all of them)? Currently we only have clickable links to individual items.

stas · February 12, 2019, 1:13am

OK, got the answer - need to create an overview post in the category and use that as a category link.

stas · February 12, 2019, 4:32am

@ashaw, it looks like in the transition to the new tools/build-docs we haven’t gotten new module doc generator sorted out.

I made it work manually for the new module by hardcoding:

import fastai.utils.mem
create_module_page(fastai.utils.mem, 'docs_src')

to tools/build-docs, but couldn’t find a way to do it via the command line - would you please kindly have a look? we only have it documented for doing it from the notebook and not from the command line.

Thank you.

stas · February 12, 2019, 5:13am

@ashaw, one more question. I have just created a new module doc: https://docs.fast.ai/utils.mem.html
How can I demote those class entries (headers in toc)? Unlike most modules, these are different and they aren’t the central thing. e.g. one is a context manager and another is a namedtuple.
Thank you.

ashaw · February 12, 2019, 7:18pm

Ah looks like I forgot to document that - you can pass in module names to generate too -
tools/build-docs fastai.utils.mem

I’ll update the docs to reflect that!

ashaw · February 12, 2019, 7:39pm

https://github.com/fastai/fastai/pull/1624 -
So you can specifically set the title level in show_doc to fix the TOC.

However, named tuples show up as class. Do you think we should give those a different treatment?

stas · February 12, 2019, 7:41pm

Hmm, it looks like the toc generator we use isn’t doing a good job of parsing header titles. Now we have:

CUDA Errors

the headers are:

cuda runtime error (59) : device-side assert triggered
cuda runtime error (11) : invalid argument

so it eats up everything after (

ashaw · February 12, 2019, 7:47pm

Ah good catch. Probably need some escaping on conversion.

stas · February 12, 2019, 7:49pm

Perfect. I didn’t know we had that. Where would be a good place to document that? I didn’t think that it was controlled by show_doc, so I didn’t try to look in its spec.

Perhaps we should have a little subheader in docgen main instructions document:

To control the header levels

If the automatic header level isn’t right, you can adjust it with an explicit title_level argument in the corresponding show_doc entry. For example:

"show_doc(...., title_level=4)"

update I documented this.

However, named tuples show up as class. Do you think we should give those a different treatment?

I don’t think a special treatment is needed. It’s really just a convenience class, and not to be used anywhere, it just makes it possible to easily expand return arguments w/o breaking the API and also the access to those is more intuitive rather than using index numbers. val.free, val.used

stas · February 12, 2019, 7:52pm

the toc generator is quite a limping piece of software and required quite a lot of massaging to make it work (we didn’t write it) so I won’t be surprised that it just needs some more love.

ashaw · February 12, 2019, 7:53pm

Hahaha very true. I’ll fix this parenthesis error, and see if we can find a better one further down the line

PegasusWithoutWinds · February 13, 2019, 11:47am

Guys, I found a potential solution for the current difficulties in collaborating on notebooks in GitHub.

Here is a tool called Notedown; basically it converts back and forth between Jupyter Notebook and Markdown. It also allows editing the markdown version as notebook in Jupyter Notebook. Basically what it does is that it puts code cells in markdown cell block, and then keep everything else as plain markdown. The only issue is that the Jupyter Notebook’s Hide Input extension stores the hide-input-cell information in the JSON, so it would be lost when converted to the markdown format.

Below is what I have in mind as a possible workflow:

Only source control the markdown. It will allow very straightforward diff and thus easy collaboration and code review in GitHub.
When updating the notebook, the users make whatever changes they want, convert the notebook back to markdown, then commit the changes to the repository.
The server then will rerun the markdown as notebook using notedown utilities, add hide-input cell information to the JSON source file for all the show_doc cells, then build docs and the website.

Let me know what you guys think.

sgugger · February 13, 2019, 2:20pm

The problem with this, is (as many other solutions) that you need to execute the notebook to get back the source notebook. Executing the whole docs sources take almost half an hour on a standard GPU so it’s not really feasible to have it automated.

stas · February 13, 2019, 11:06pm

I was thinking one more thing our API documentation is lacking is when any of those functions was added. Here are some situations where it’s important:

the docs include just added functions that aren’t in the release yet. the users have no way of telling if they have access to it or have to wait till the next release (workaround: check CHANGES file)
some people might be using an older version for a specific reason e.g. a regression in some parts of fastai for their specific need, or they just don’t want to keep syncing their code with the ever changing API. Now they can’t tell whether their version supports a function they are thinking of using (workaround: again, check CHANGES file, but we might miss some of those)
if we want a deprecation cycle, we also need to be clear in which version it will be removed.

This is not a huge issue, but I thought I’d share. And I thought the docgen could easily automate this process and add [since 1.0.42] when it first generates the entry for a new function that hasn’t been documented yet. And to catch up it could just start with 1.0.43 for all already documented functions. Just an idea.