Documentation improvements

Hi, I was examining to_data function. And while doing so… I found a strange behaviour in a simple code. Below is the code sample:

from fastai import *
from fastai.vision import *

path = untar_data(URLs.MNIST_SAMPLE)
data = ImageDataBunch.from_folder(path)

# Examining the labels
print(set(data.y)) 

The output of the above is given below:

{Category 3, Category 3, Category 3, ...<repeated many times>, Category 7, Category 7, Category 7}

Whereas the expected output IMO is:

{Category 3, Category 7}

Just wanted to know, is this an expected behavior or a bug. I guess a new Category class is being instantiated for each image and it looks a bit inefficient from a very high-level understanding. Sorry if this is a silly question or if I missed something.

Also, I have submitted a PR. I have completed most of the functions in a specific section of torch_core. Do send me feedback on what can be improved. :slight_smile:

PR Link: https://github.com/fastai/fastai/pull/1485

Thanks
NVS Abhilash

At http://docs.fast.ai/, can those blocks be displayed with markdown formatting?
Like this:


(It was confusing to me, and I think it can be confusing for other beginners)

update: resolved.

1 Like

Here are the things that need fixing in the doc generating code: fastai/gen_doc/nbdoc.py

  1. backticks around all arguments and defaults in show_doc should be removed? Each component is already wrapped in <code></code>

    PointsItemList(`args`, `convert_mode`=`'RGB'`, `kwargs`) ::
    

    So the backticks are just an unnecessary noise.

  2. show_doc ignores *, ** in function definitions, e.g. showing kwargs instead of **kwargs.

    Inside format_param, p.name returns kwargs - not sure how to retrieve ** from p. It does know that the original arg was <Parameter "**kwargs">, but none of the attributes indicate that.

update: both were resolved by Andrew! Thank you.

2 Likes

The docs css needs to be improved to have consistent font sizes:

  1. the show_doc sig uses larger fonts than the rest of the html, so it’s a bit painful. I guess we need to tweak the .css to get to use similar fonts for mono and non-mono fonts.

  2. same for quoting - uses a much bigger font for blockquote, e.g. the top of https://docs.fast.ai/dev/abbr.html

I added a basic entry for the resize transform, it could use an expansion to offer more efficient ways to do the resize once, rather than doing it on the go.

If you’d like to make a PR with various ways one could do the resize on the filesystem once, instead of re-doing it on every run, that would be great. I usually use imagemagick but I don’t know if it’s still the best method. There are a few threads about the subject matter here on the forums, so perhaps if you could make a summary of the best methods including the nuances of resizing mixed sized images, etc. that would be useful to have documented in one place.

5 posts were merged into an existing topic: Misc issues

Another essential help that’s needed is fixing broken links in docs:

Hi Stas, I can take care of the links. Can you please quickly point to what I need to modify to fix them ? Thanks !

1 Like

Thank you, @PierreO!

  1. Edit the source *.ipynb notebooks under docs_src - don’t forget [Save]!!!
  2. Convert them to html https://docs.fast.ai/gen_doc_main.html#updating-html-only
  3. Install everything you need to get the docsite locally and see that you can start it https://docs.fast.ai/gen_doc_main.html#testing-site-locally but then shut it down
  4. Run the link-checker locally https://github.com/fastai/fastai/blob/master/tools/checklink/README.md#checking-the-site-locally to validate that the links/anchors have been fixed (it will start the local server that you enabled in step 3). This step will also require a one time link-checker setup stage, which you will find in the same document.

If you have any difficulties please let me know.

p.s. I know @sgugger is currently doing some major API updates (non-breaking - replacing kwargs with explicit args), which will require doc updates, so I recommend you ask him whether it’s a good time for this effort (because 2 people editing the same notebooks at the same time is a difficult).

1 Like

I just finished to update the docs accordingly, so you can go ahead!

2 Likes

I’m trying to fix the broking links to SMScores and RegMetrics in metrics. Those two classes aren’t shown in the metrics page (there’s no show_doc for them), but when I try to add it to the notebook I get the following error :

NameError                                 Traceback (most recent call last)
<ipython-input-7-d36052e817e9> in <module>
----> 1 show_doc(CMScores, title_level=3)

NameError: name 'CMScores' is not defined
NameError                                 Traceback (most recent call last)
<ipython-input-26-3ff80097d200> in <module>
----> 1 show_doc(RegMetrics, title_level=3)

NameError: name 'RegMetrics' is not defined

Not really sure why … Any idea ?

EDIT : same issue for CategoryListBase in data_block

That’s because they are not in the __all__ variable of that module, because they are internal classes. You can put them there if you want to document them, or manually import them (from bla import bli will still work).

1 Like

Oh right, thanks.

One last question : I did a first PR 2 days ago (that I later closed because I saw some mistakes). In it seems some markdown got converted to html tags (** [...]** converted to <b> [...] </b> for example). Could you point me out how to fix this ?

Hey dear maintainers:

In my current understanding, at least for the Training section, each documentation page corresponds to a Python module file. So the page of basic_train corresponds to basic_train.py.

However, in basic_train, it also shows the docs of fit_one_cycle and lr_find, both of which are actually defined in train.py and do have docs in train. I understand that these two methods are very commonly used and, in this sense, are very basic. But assuming that we still want to stick to our current organization of documentation, that is, a one-to-one correspondence between the docs page and the Python module file, should we then remove fit_one_cycle and lr_find from basic_train?

It’s true that those methods are defined later, but they are monkey-patched to Learner, and we want the user to find all the methods that Learner has in the same doc page (it would be rather counter-intuitive otherwise). That’s why there is this discrepancy.

That looks correct, I’m not sure why the currently commited html page has ‘**’ in it in first place - those should be <b></b>

1 Like

Sorry if this question has been answered before, but could you please explain a bit about why those methods are monkey-patched in another file, instead of defined in the same file?

@ashaw, I found a a small issue with the doc generator: TOC is always missing the first entry of the document because the first header of an actual section is marked as h1 and not h2, full details follow:

Let’s compare 2 files:
a. generated from .md: https://docs.fast.ai/gen_doc_main.html
b. generated from .ipynb: https://docs.fast.ai/gen_doc.gen_notebooks.html

a. the first entry is in the TOC and has h1 and h2:

<h1 class=“post-title-main”>Doc Maintenance</h1>

<h2 id=“process-for-contributing-to-the-docs”>Process for contributing to the docs</h2>

b. has h1 twice,
<h1 class=“post-title-main”>gen_doc.gen_notebooks</h1>
<h1 id=“Notebook-generation”>Notebook generation<a class=“anchor-link” href="#Notebook-generation">&#182;</a></h1>

that second h1, should be h2 - that’s why the first entry is always missing from TOC.

This was just an example - it affects all .ipynb files.

Thank you.

OK, it’s rather simple - it’s the notebooks that all start with # foo, which forces a 2nd <h1>, instead of ## foo.

So, I think the generator has nothing to do with it other than creating the initial notebook that way.

I changed one notebook to start with ## foo and it’s just fine. So perhaps the solution is to fix all notebooks?

And perhaps the generator should create the initial notebook (when a new one is created) that starts with ## foo?

So I did that for all .ipynb besides index.ipynb, which doesn’t have its own h1, and now we have the first missing TOC entry back on all pages and it’s linked too, e.g. https://docs.fast.ai/vision.data.html#Computer-vision-data

I did:

perl -0777 -pi -e 's|# |## |ms' docs_src/*.ipynb
tools/build-docs -l

So the only remaining thing is to tweak the doc-generator so that when it creates the initial ipynb, it will start with ## for the first header instead of #.

1 Like