@sgugger Thanks a lot! This is nice to know the groups’ purposes.
@ashaw, here is another small doc gen improvement request.
We get quite a few PRs with users modifying the autogenerated html, since they don’t realize they are autogenerated.
In the fastai_docs when we autogenerate .py code we inject this header at the top of the file:
#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: dev_nb/01_matmul.ipynb
So I was thinking perhaps it’d work to inject something similar in our html files? e.g. for docs/basic_train.html
:
<!--
#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################
# file to edit: docs_src/basic_train.ipynb
# instructions: https://docs.fast.ai/gen_doc_main.html
-->
I added the ample vertical whitespace so that hopefully it’ll stand out from the dense HTML once the user opens it in their editor. I’m not sure whether it can appear at the very top, or after the jekyll headers.
Thank you.
Is from fastai import *
really necessary?
Hi everyone,
on import of docs.fast.ai, it says:
In order to do so, the module dependencies are carefully managed (see next section), with each exporting a carefully chosen set of symbols when using
import *
. In general, for interactive computing, you’ll want to import from bothfastai
, and from one of the applications , such as:
from fastai.vision import *
it seems to suggest we should do import for interactive computing in the following way
from fastai import *
from fastai.vision import *
However, if you experiment as I did it here on kaggle, you will notice that from fastai import *
add nothing to from fastai.vision import *
.
Therefore, I attempt to say that from fastai import *
is unnecessary.
Am I missing something here? if so, please correct me. Thanks
Oh this is legacy behavior. It used to be necessary to do two imports, but nowadays it’s either
from fastai.basics import *
(just the core + training loop)
or
from fastai.{application} import *
If you want to adjust the docs, feel free to suggest a PR!
Thanks @sgugger
Here is my proposed change to the doc. Please have a look.
In order to do so, the module dependencies are carefully managed (see next section), with each exporting a carefully chosen set of symbols when using import *
. In general, for interactive computing, to play around the core module and training loop you can do
from fastai.basics import *
If you want experiment with one of the applications such as vision, then you can do
from fastai.vision import *
index page: data link points to wrong page?
On this page https://docs.fast.ai/index.html#Dependencies, all the data
links point to vision.data
, but according to the context, they should point to links of basic_data
. Do I understand the context correctly? Could anyone double check them for me? thanks!
Then, there are three modules directly on top of
torch_core
:
data
, which contains the class that will take aDataset
or pytorchDataLoader
to wrap it in aDeviceDataLoader
(a class that sits on top of aDataLoader
and is in charge of putting the data on the right device as well as applying transforms such as normalization) and regroup then in aDataBunch
.
This takes care of the basics, then we regroup a model with some data in a
Learner
object to take care of training. More specifically:
callback
(depends ondata
) defines the basis of callbacks and theCallbackHandler
. Those are functions that will be called every step of the way of the training loop and can allow us to customize what is happening there;
From
data
we can split on one of the four main applications , which each has their own module:vision
,text
collab
, ortabular
. Each of those submodules is built in the same way with:
Which modules do learn
below refer to
https://docs.fast.ai/index.html#Dependencies in the last two blocks of text, we can see
learn
(depends oncallbacks
) defines helper functions to invoke the callbacks more easily.
- optionally, a submodule named
learn
that will containLearner
specific to the application.
There are no modules named learn
any more. My guess is the following. Could you verify them for me? @sgugger Thanks!
Yes this wasn’t properly updated when we changed data
to basic_data
.
The first learn
is now train
, and in the second, the submodule is {application}.learner
, that’s correct.
Thanks so much for proofreading and making this consistent with the current stage of the library!!!
I am trying to work with the tabular module but found the documentation a bit incomplete (for beginners)… Now I am going to try and look for improvements! I think it is a great way to learn and help other beginners. (Although I am a bit afraid of making mistakes)
@Eva
My experience tells me the following
Try hard and ask for help and keep up, and you will see how friendly and supportive this place is, and your worry will be gone.
Thanks @sgugger
I will make them a PR about those tiny changes.
Just feel proud to contribute to the best deep learning library and organization!
@sgugger Thank you for all the work done on the fastai library and documentation. However, I personnaly think that there is much more than just examples that are missing. I have been struggeling for quite some time to understand arguments in several basic functions and I would gladly help to enrich the doc once I get a better understanding of it.
The most essential part that I find missing is a clear description of each parameters and not just its type and default value. Without this information I am for example left hanging when just trying to specify a validation folder working on the first lesson with another dataset that has a definite train and test set that are seperated into 2 folders and are both labeled on their filenames.
The takeaway of my message is really, not only examples, but also parameters description.
Thanks again for all the hard work !
I ran into the same issue! It is hard to get a clear description of the functions when you want to use your own dataset or work on kaggle competitions.
Let’s get to work!
maybe sth like this can be of interest in the future: https://developers.google.com/season-of-docs/docs/
Having trouble to verify " items
. create_func
will default to open_image
" in ImageList
I have tried to read source code of ItemList
, ImageList
, and their from_folder
to figure out how ImageList.from_folder
work.
I can use pdb
to walk through the flow of codes, but I can’t find the exact step for turning image file path object into Image object, see below for comparison
So, I go to check on the docs, the second sentence makes perfect sense to explain the missing puzzle I encountered above:
Create a
ItemList
inpath
from filenames initems
.create_func
will default toopen_image
.
However, I could not locate the place where items.create_func
is set to open_image
, in fact the items.create_func
seem not exist
So, could you show me exact where in the source code items.create_func
is set to open_image
?
Thanks!
First of all, I found the exact codes for turning Path
object into Image
object below
Second, there is no such thing called items.create_func
. So, I would like to rewrite the sentence as follows
It inherits from
ItemList
and overwriteItemList.get
to callopen_image
in order to turn an image file inPath
object into anImage
object.
to improve the docs of untar_data
untar_data
[source][test]
untar_data
(url
:str
,fname
:PathOrStr
=None
,dest
:PathOrStr
=None
,data
=True
,force_download
=False
) →Path
Download url
to fname
if it doesn’t exist, and un-tgz to folder dest
.
it
above in its semantic context refers to fname
, but according to the source code, it
should refer to dest
, because only when not dest.exist()
returns True
, download_data
will be executed
I would like to provide the following docs for untar_data
In general,
untar_data
use aurl
to download atgz
file underfname
, and then un-tgzfname
into a folder underdest
.
After initial download, if running
untar_data
again withforce_download=True
or the tgz file underfname
is corrupted somehow, then existingfname
anddest
will be removed and start to download again.
After initial downloading, if
dest
does not exist, meaning no folder underdest
exist (the folder could be removed or renamed somehow), then runninguntar_data
will executedownload_data
; and if the tgz file underfname
exist, then there will be no actual downloading rather than un-tgzfname
intodest
; iffname
does not exist, then downloading for the tgz file will be actually executed.
Note: the
url
you feed tountar_data
must be one ofURLs.something
.
What do you think of this version of docs? Thanks
@stas @sgugger
Yes it seems nice. Please specify in a warning it’s only intended to be used with urls that come in URLs.something
.