Documentation improvements

@ashaw, here is another small doc gen improvement request.

We get quite a few PRs with users modifying the autogenerated html, since they don’t realize they are autogenerated.

In the fastai_docs when we autogenerate .py code we inject this header at the top of the file:

#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: dev_nb/01_matmul.ipynb

So I was thinking perhaps it’d work to inject something similar in our html files? e.g. for docs/basic_train.html:

<!--


#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: docs_src/basic_train.ipynb
# instructions: https://docs.fast.ai/gen_doc_main.html


-->

I added the ample vertical whitespace so that hopefully it’ll stand out from the dense HTML once the user opens it in their editor. I’m not sure whether it can appear at the very top, or after the jekyll headers.

Thank you.

2 Likes

Is from fastai import * really necessary?

Hi everyone,

on import of docs.fast.ai, it says:

In order to do so, the module dependencies are carefully managed (see next section), with each exporting a carefully chosen set of symbols when using import * . In general, for interactive computing, you’ll want to import from both fastai , and from one of the applications , such as:

from fastai.vision import *

it seems to suggest we should do import for interactive computing in the following way

from fastai import *
from fastai.vision import *

However, if you experiment as I did it here on kaggle, you will notice that from fastai import * add nothing to from fastai.vision import *.

Therefore, I attempt to say that from fastai import * is unnecessary.

Am I missing something here? if so, please correct me. Thanks

Oh this is legacy behavior. It used to be necessary to do two imports, but nowadays it’s either

from fastai.basics import *

(just the core + training loop)
or

from fastai.{application} import *

If you want to adjust the docs, feel free to suggest a PR!

3 Likes

Thanks @sgugger

Here is my proposed change to the doc. Please have a look.


In order to do so, the module dependencies are carefully managed (see next section), with each exporting a carefully chosen set of symbols when using import * . In general, for interactive computing, to play around the core module and training loop you can do

from fastai.basics import *

If you want experiment with one of the applications such as vision, then you can do

from fastai.vision import *

index page: data link points to wrong page?

On this page https://docs.fast.ai/index.html#Dependencies, all the data links point to vision.data, but according to the context, they should point to links of basic_data. Do I understand the context correctly? Could anyone double check them for me? thanks!

Then, there are three modules directly on top of torch_core :

This takes care of the basics, then we regroup a model with some data in a Learner object to take care of training. More specifically:

  • callback (depends on data ) defines the basis of callbacks and the CallbackHandler . Those are functions that will be called every step of the way of the training loop and can allow us to customize what is happening there;

From data we can split on one of the four main applications , which each has their own module: vision , text collab , or tabular . Each of those submodules is built in the same way with:

@sgugger

Which modules do learn below refer to

https://docs.fast.ai/index.html#Dependencies in the last two blocks of text, we can see

  • learn (depends on callbacks ) defines helper functions to invoke the callbacks more easily.
  • optionally, a submodule named learn that will contain Learner specific to the application.

There are no modules named learn any more. My guess is the following. Could you verify them for me? @sgugger Thanks!

Yes this wasn’t properly updated when we changed data to basic_data.

1 Like

The first learn is now train, and in the second, the submodule is {application}.learner, that’s correct.
Thanks so much for proofreading and making this consistent with the current stage of the library!!!

1 Like

I am trying to work with the tabular module but found the documentation a bit incomplete (for beginners)… Now I am going to try and look for improvements! I think it is a great way to learn and help other beginners. (Although I am a bit afraid of making mistakes)

@Eva
My experience tells me the following

Try hard and ask for help and keep up, and you will see how friendly and supportive this place is, and your worry will be gone.

1 Like

Thanks @sgugger

I will make them a PR about those tiny changes.

Just feel proud to contribute to the best deep learning library and organization!

@sgugger Thank you for all the work done on the fastai library and documentation. However, I personnaly think that there is much more than just examples that are missing. I have been struggeling for quite some time to understand arguments in several basic functions and I would gladly help to enrich the doc once I get a better understanding of it.

The most essential part that I find missing is a clear description of each parameters and not just its type and default value. Without this information I am for example left hanging when just trying to specify a validation folder working on the first lesson with another dataset that has a definite train and test set that are seperated into 2 folders and are both labeled on their filenames.

The takeaway of my message is really, not only examples, but also parameters description.

Thanks again for all the hard work ! :slight_smile:

I ran into the same issue! It is hard to get a clear description of the functions when you want to use your own dataset or work on kaggle competitions.

Let’s get to work! :slight_smile:

maybe sth like this can be of interest in the future: https://developers.google.com/season-of-docs/docs/

@sgugger

Having trouble to verify " items. create_func will default to open_image " in ImageList

I have tried to read source code of ItemList, ImageList, and their from_folder to figure out how ImageList.from_folder work.

I can use pdb to walk through the flow of codes, but I can’t find the exact step for turning image file path object into Image object, see below for comparison

So, I go to check on the docs, the second sentence makes perfect sense to explain the missing puzzle I encountered above:

Create a ItemList in path from filenames in items . create_func will default to open_image .

However, I could not locate the place where items.create_func is set to open_image, in fact the items.create_func seem not exist

So, could you show me exact where in the source code items.create_func is set to open_image?

Thanks!

First of all, I found the exact codes for turning Path object into Image object below

Second, there is no such thing called items.create_func. So, I would like to rewrite the sentence as follows

It inherits from ItemList and overwrite ItemList.get to call open_image in order to turn an image file in Path object into an Image object.

What do you think? Thanks
@stas @sgugger

to improve the docs of untar_data

untar_data [source][test]

untar_data ( url : str , fname : PathOrStr = None , dest : PathOrStr = None , data = True , force_download = False ) → Path

Download url to fname if it doesn’t exist, and un-tgz to folder dest .


it above in its semantic context refers to fname, but according to the source code, it should refer to dest, because only when not dest.exist() returns True, download_data will be executed

I would like to provide the following docs for untar_data

In general, untar_data use a url to download a tgz file under fname, and then un-tgz fname into a folder under dest.

After initial download, if running untar_data again with force_download=True or the tgz file under fname is corrupted somehow, then existing fname and dest will be removed and start to download again.

After initial downloading, if dest does not exist, meaning no folder under dest exist (the folder could be removed or renamed somehow), then running untar_data will execute download_data; and if the tgz file under fname exist, then there will be no actual downloading rather than un-tgz fname into dest; if fname does not exist, then downloading for the tgz file will be actually executed.

Note: the url you feed to untar_data must be one of URLs.something.

What do you think of this version of docs? Thanks
@stas @sgugger

1 Like

Yes it seems nice. Please specify in a warning it’s only intended to be used with urls that come in URLs.something.

1 Like

Thanks @sgugger , I have added the warning as the following

Hi @stas

could you help me check on this doc improvement when you have time?

see ImageList: the problem, ImageList: proposed improvement

Thanks!