Documentation improvements

@Eva
My experience tells me the following

Try hard and ask for help and keep up, and you will see how friendly and supportive this place is, and your worry will be gone.

1 Like

Thanks @sgugger

I will make them a PR about those tiny changes.

Just feel proud to contribute to the best deep learning library and organization!

@sgugger Thank you for all the work done on the fastai library and documentation. However, I personnaly think that there is much more than just examples that are missing. I have been struggeling for quite some time to understand arguments in several basic functions and I would gladly help to enrich the doc once I get a better understanding of it.

The most essential part that I find missing is a clear description of each parameters and not just its type and default value. Without this information I am for example left hanging when just trying to specify a validation folder working on the first lesson with another dataset that has a definite train and test set that are seperated into 2 folders and are both labeled on their filenames.

The takeaway of my message is really, not only examples, but also parameters description.

Thanks again for all the hard work ! :slight_smile:

I ran into the same issue! It is hard to get a clear description of the functions when you want to use your own dataset or work on kaggle competitions.

Let’s get to work! :slight_smile:

maybe sth like this can be of interest in the future: https://developers.google.com/season-of-docs/docs/

@sgugger

Having trouble to verify " items. create_func will default to open_image " in ImageList

I have tried to read source code of ItemList, ImageList, and their from_folder to figure out how ImageList.from_folder work.

I can use pdb to walk through the flow of codes, but I can’t find the exact step for turning image file path object into Image object, see below for comparison

So, I go to check on the docs, the second sentence makes perfect sense to explain the missing puzzle I encountered above:

Create a ItemList in path from filenames in items . create_func will default to open_image .

However, I could not locate the place where items.create_func is set to open_image, in fact the items.create_func seem not exist

So, could you show me exact where in the source code items.create_func is set to open_image?

Thanks!

First of all, I found the exact codes for turning Path object into Image object below

Second, there is no such thing called items.create_func. So, I would like to rewrite the sentence as follows

It inherits from ItemList and overwrite ItemList.get to call open_image in order to turn an image file in Path object into an Image object.

What do you think? Thanks
@stas @sgugger

to improve the docs of untar_data

untar_data [source][test]

untar_data ( url : str , fname : PathOrStr = None , dest : PathOrStr = None , data = True , force_download = False ) → Path

Download url to fname if it doesn’t exist, and un-tgz to folder dest .


it above in its semantic context refers to fname, but according to the source code, it should refer to dest, because only when not dest.exist() returns True, download_data will be executed

I would like to provide the following docs for untar_data

In general, untar_data use a url to download a tgz file under fname, and then un-tgz fname into a folder under dest.

After initial download, if running untar_data again with force_download=True or the tgz file under fname is corrupted somehow, then existing fname and dest will be removed and start to download again.

After initial downloading, if dest does not exist, meaning no folder under dest exist (the folder could be removed or renamed somehow), then running untar_data will execute download_data; and if the tgz file under fname exist, then there will be no actual downloading rather than un-tgz fname into dest; if fname does not exist, then downloading for the tgz file will be actually executed.

Note: the url you feed to untar_data must be one of URLs.something.

What do you think of this version of docs? Thanks
@stas @sgugger

1 Like

Yes it seems nice. Please specify in a warning it’s only intended to be used with urls that come in URLs.something.

1 Like

Thanks @sgugger , I have added the warning as the following

Hi @stas

could you help me check on this doc improvement when you have time?

see ImageList: the problem, ImageList: proposed improvement

Thanks!

My guess is that you’re after these instructions:
https://docs.fast.ai/gen_doc_main.html#updating-an-existing-functionclass

If not, please, be more specific of what you need help with - as in - I’m trying to do foo, I did bar, and I can’t figure out how to get tar…

1 Like

Thanks @stas , I will make sure to be more specific next time.

1 Like

I have tried to improve on the docs of get_files, by adding the following two paragraphs

To to more precise, this function returns list of FilePath objects using files in path that must have a suffix in extensions, and hidden folders and files are ignored. If recurse=True, all files in subfolders will be applied; include is used to select particular folders to apply.

Inside get_files, there is _get_files which turns all filenames inside f from directory parent/p into a list of FilePath objects. All filenames must have a suffix in extensions. All hidden files are ignored.

Do they make sense? or do they make the use of get_files easier to understand?

Also I want to provide code examples to show how to use get_files. Can I add code examples in the following style? or what is the recommended style of providing small code examples?

import fastai.vision as fv
path_data = fv.untar_data(fv.URLs.MNIST_TINY)
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny')
path_data.ls()
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/valid'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/models'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train')]
list_FilePath_noRecurse = fv.get_files(path_data); list_FilePath
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv')]

list_FilePath_recurse = fv.get_files(path_data, recurse=True); list_FilePath_recurse[:3]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/valid/7/9294.png')]
list_FilePath_recurse[-2:]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train/3/7263.png'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train/3/7288.png')]
list_FilePath_include = fv.get_files(path_data, recurse=True, include=['test']);
list_FilePath_include[:3]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/4605.png')]
list_FilePath_include[-3:]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/1605.png'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/2642.png'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/5071.png')]

@stas @sgugger

Thanks a lot!

  • I’d recommend to go over it again, as it includes quite a few language issues.
  • Para 2 is already covered in para 1, it seems that you’re repeating yourself.
  • I don’t think you need to talk about internal functions, that’s why they are internal. i.e. explain overall what the function does, but you don’t need to talk about specifics of implementation - people can look at the source code if they want the specifics :wink:

Also I want to provide code examples to show how to use get_files . Can I add code examples in the following style? or what is the recommended style of providing small code examples?

Most of time, and certainly in your case - you just code it as you’d in a normal jupyter nb, and run it and the results will be saved. e.g. look at the tutorials, such as https://docs.fast.ai/tutorial.data.html - but, of course, study the corresponding source tutorial.data..ipynb nb and you will see how it’s done.

When the setup for an example is complex, I just paste the code and the output as a code block, but you don’t need it here.

2 Likes

I was participating in a Kaggle competition in which the images were available as URLs with their labels. So to download the images, I thought about using download_images function in vision.data. But urls had to be mapped to their labels also, as after downloading the images are names like 000001.jpg,

So I modified the function that download_images called. Should I add it as an example under download_images. This was the code I used. I put a $ sign at the beginning and added label before that.

# This function is same as fastai source code, but added df argument.
# As I had stored my data as (url, label) in dataframe. In downlo
def _download_image_inner(df, dest, url, i, timeout=4):
    suffix = re.findall(r'\.\w+?(?=(?:\?|$))', url)
    suffix = suffix[0] if len(suffix)>0  else '.jpg'
    label = str(int(df.loc[df['img_url'] == url]['label']))
    download_image(url, dest/f"{label}_{i:08d}{suffix}", timeout=timeout)

# Code taken from download_images. But changed the first line, to take urls from df
urls = list(df['img_url'])
dest = Path(dest)
dest.mkdir(exist_ok=True)
parallel(partial(_download_image_inner, df, dest, timeout=timeout), urls, max_workers=max_workers)

I have been experimenting on the source codes of ImageList.open, open_image, Image to find out which line of source codes is responsible for printing out images (as output of line 4 and line 5 of the notebook below) in using the following codes in a Jupyter notebook

from fastai.vision import *
path_data = untar_data(URLs.PLANET_TINY); path_data.ls()
imagelistRGB = ImageList.from_folder(path_data/'train'); imagelistRGB
imagelistRGB.items[10] # output is to print out an image 
imagelistRGB.open(imagelistRGB.items[10]) # output is to print out an image 

However, I could find no source code in fastai responsible for this kind of image-printing-out behavior.

And then, I found the following statement in open_image docs

As we saw, in a Jupyter Notebook, the representation of an Image is its underlying picture (shown to its full size).

Does this statement suggests that it is some built-in codes from jupyter that created this image printing out behavior?

If not, then could you help me identify which line of source code in fastai is responsible for this behavior?

Thanks a lot!

@stas @sgugger

It’s the representation we coded for Image: see Image.__repr__.

I have located the source code at https://github.com/fastai/fastai/blob/master/fastai/vision/image.py#L87

def __repr__(self): return f'{self.__class__.__name__} {tuple(self.shape)}'

which only output things like Image (3, 28, 28) when typed imagelist[0].__repr__() which does not print out an image.

I have found no fastai code which is responsible for printing out an image when running imagelist[0] at Jupyter notebook. This is why I suspect maybe it is jupyter notebook itself actually print out the image from typing imagelist[0], but I can’t prove it. ( please see the very short notebook for the experiment I tried)

Thanks!

Hi @stas
when you have time, could you also have a look at the last three replies? I am really curious about the behavior of imagelistRGB.items[10] in jupyter notebook.
Thanks a lot!

It has to do with Rich display methods in ipython:
https://ipython.readthedocs.io/en/stable/config/integrating.html
and you need to be looking at: https://github.com/fastai/fastai/blob/master/fastai/vision/image.py#L88-L89

    def _repr_png_(self): return self._repr_image_format('png')
    def _repr_jpeg_(self): return self._repr_image_format('jpeg')

that’s where it happens.

your notebook url is 404, you need to use https://mybinder.org/ if you want to show a persistent live notebook.

1 Like