Documentation improvements

First of all, I found the exact codes for turning Path object into Image object below

Second, there is no such thing called items.create_func. So, I would like to rewrite the sentence as follows

It inherits from ItemList and overwrite ItemList.get to call open_image in order to turn an image file in Path object into an Image object.

What do you think? Thanks
@stas @sgugger

to improve the docs of untar_data

untar_data [source][test]

untar_data ( url : str , fname : PathOrStr = None , dest : PathOrStr = None , data = True , force_download = False ) → Path

Download url to fname if it doesn’t exist, and un-tgz to folder dest .


it above in its semantic context refers to fname, but according to the source code, it should refer to dest, because only when not dest.exist() returns True, download_data will be executed

I would like to provide the following docs for untar_data

In general, untar_data use a url to download a tgz file under fname, and then un-tgz fname into a folder under dest.

After initial download, if running untar_data again with force_download=True or the tgz file under fname is corrupted somehow, then existing fname and dest will be removed and start to download again.

After initial downloading, if dest does not exist, meaning no folder under dest exist (the folder could be removed or renamed somehow), then running untar_data will execute download_data; and if the tgz file under fname exist, then there will be no actual downloading rather than un-tgz fname into dest; if fname does not exist, then downloading for the tgz file will be actually executed.

Note: the url you feed to untar_data must be one of URLs.something.

What do you think of this version of docs? Thanks
@stas @sgugger

1 Like

Yes it seems nice. Please specify in a warning it’s only intended to be used with urls that come in URLs.something.

1 Like

Thanks @sgugger , I have added the warning as the following

Hi @stas

could you help me check on this doc improvement when you have time?

see ImageList: the problem, ImageList: proposed improvement

Thanks!

My guess is that you’re after these instructions:
https://docs.fast.ai/gen_doc_main.html#updating-an-existing-functionclass

If not, please, be more specific of what you need help with - as in - I’m trying to do foo, I did bar, and I can’t figure out how to get tar…

1 Like

Thanks @stas , I will make sure to be more specific next time.

1 Like

I have tried to improve on the docs of get_files, by adding the following two paragraphs

To to more precise, this function returns list of FilePath objects using files in path that must have a suffix in extensions, and hidden folders and files are ignored. If recurse=True, all files in subfolders will be applied; include is used to select particular folders to apply.

Inside get_files, there is _get_files which turns all filenames inside f from directory parent/p into a list of FilePath objects. All filenames must have a suffix in extensions. All hidden files are ignored.

Do they make sense? or do they make the use of get_files easier to understand?

Also I want to provide code examples to show how to use get_files. Can I add code examples in the following style? or what is the recommended style of providing small code examples?

import fastai.vision as fv
path_data = fv.untar_data(fv.URLs.MNIST_TINY)
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny')
path_data.ls()
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/valid'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/models'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train')]
list_FilePath_noRecurse = fv.get_files(path_data); list_FilePath
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv')]

list_FilePath_recurse = fv.get_files(path_data, recurse=True); list_FilePath_recurse[:3]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/valid/7/9294.png')]
list_FilePath_recurse[-2:]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train/3/7263.png'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train/3/7288.png')]
list_FilePath_include = fv.get_files(path_data, recurse=True, include=['test']);
list_FilePath_include[:3]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/4605.png')]
list_FilePath_include[-3:]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/1605.png'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/2642.png'),
#  PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/5071.png')]

@stas @sgugger

Thanks a lot!

  • I’d recommend to go over it again, as it includes quite a few language issues.
  • Para 2 is already covered in para 1, it seems that you’re repeating yourself.
  • I don’t think you need to talk about internal functions, that’s why they are internal. i.e. explain overall what the function does, but you don’t need to talk about specifics of implementation - people can look at the source code if they want the specifics :wink:

Also I want to provide code examples to show how to use get_files . Can I add code examples in the following style? or what is the recommended style of providing small code examples?

Most of time, and certainly in your case - you just code it as you’d in a normal jupyter nb, and run it and the results will be saved. e.g. look at the tutorials, such as https://docs.fast.ai/tutorial.data.html - but, of course, study the corresponding source tutorial.data..ipynb nb and you will see how it’s done.

When the setup for an example is complex, I just paste the code and the output as a code block, but you don’t need it here.

2 Likes

I was participating in a Kaggle competition in which the images were available as URLs with their labels. So to download the images, I thought about using download_images function in vision.data. But urls had to be mapped to their labels also, as after downloading the images are names like 000001.jpg,

So I modified the function that download_images called. Should I add it as an example under download_images. This was the code I used. I put a $ sign at the beginning and added label before that.

# This function is same as fastai source code, but added df argument.
# As I had stored my data as (url, label) in dataframe. In downlo
def _download_image_inner(df, dest, url, i, timeout=4):
    suffix = re.findall(r'\.\w+?(?=(?:\?|$))', url)
    suffix = suffix[0] if len(suffix)>0  else '.jpg'
    label = str(int(df.loc[df['img_url'] == url]['label']))
    download_image(url, dest/f"{label}_{i:08d}{suffix}", timeout=timeout)

# Code taken from download_images. But changed the first line, to take urls from df
urls = list(df['img_url'])
dest = Path(dest)
dest.mkdir(exist_ok=True)
parallel(partial(_download_image_inner, df, dest, timeout=timeout), urls, max_workers=max_workers)

I have been experimenting on the source codes of ImageList.open, open_image, Image to find out which line of source codes is responsible for printing out images (as output of line 4 and line 5 of the notebook below) in using the following codes in a Jupyter notebook

from fastai.vision import *
path_data = untar_data(URLs.PLANET_TINY); path_data.ls()
imagelistRGB = ImageList.from_folder(path_data/'train'); imagelistRGB
imagelistRGB.items[10] # output is to print out an image 
imagelistRGB.open(imagelistRGB.items[10]) # output is to print out an image 

However, I could find no source code in fastai responsible for this kind of image-printing-out behavior.

And then, I found the following statement in open_image docs

As we saw, in a Jupyter Notebook, the representation of an Image is its underlying picture (shown to its full size).

Does this statement suggests that it is some built-in codes from jupyter that created this image printing out behavior?

If not, then could you help me identify which line of source code in fastai is responsible for this behavior?

Thanks a lot!

@stas @sgugger

It’s the representation we coded for Image: see Image.__repr__.

I have located the source code at https://github.com/fastai/fastai/blob/master/fastai/vision/image.py#L87

def __repr__(self): return f'{self.__class__.__name__} {tuple(self.shape)}'

which only output things like Image (3, 28, 28) when typed imagelist[0].__repr__() which does not print out an image.

I have found no fastai code which is responsible for printing out an image when running imagelist[0] at Jupyter notebook. This is why I suspect maybe it is jupyter notebook itself actually print out the image from typing imagelist[0], but I can’t prove it. ( please see the very short notebook for the experiment I tried)

Thanks!

Hi @stas
when you have time, could you also have a look at the last three replies? I am really curious about the behavior of imagelistRGB.items[10] in jupyter notebook.
Thanks a lot!

It has to do with Rich display methods in ipython:
https://ipython.readthedocs.io/en/stable/config/integrating.html
and you need to be looking at: https://github.com/fastai/fastai/blob/master/fastai/vision/image.py#L88-L89

    def _repr_png_(self): return self._repr_image_format('png')
    def _repr_jpeg_(self): return self._repr_image_format('jpeg')

that’s where it happens.

your notebook url is 404, you need to use https://mybinder.org/ if you want to show a persistent live notebook.

1 Like

Thanks @stas , this is very helpful!
The notebook is updated here, mybinder sounds very fun but have not gotten it to work yet. Instead I used a kaggle notebook.

_repr_png_ only output a very long text (I think it is very close to an image) but I have not yet to reproduce the image yet. Is the rest of the image printing job done by Ipython’s functionality?

Thanks!

imagelist[0]

output_11_0

print(imagelist[0])
Image (3, 28, 28)
imagelist[0].__repr__()
'Image (3, 28, 28)'
imagelist[0]._repr_png_()
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x1c\x00\x00\x00\x1c\x08\x06\x00\x00\x00r\r\xdf\x94\x00\x00\x00\x04sBIT\x08\x08\x08\x08|\x08d\x88\x00\x00\x01\xf0IDATH\x89\xed\xd6\xbf\xcbiq\x1c\x07\xf0\xf7s{\x06\xca\x8f\xe5(%%\xacR\x06\x93Ef\x83\x85\xa4,\xfe\x06\x93M\x94\xcd@\xf1\x1f\x98<\x98\x94\xb2))\xb1J\x18X\x94\x810 \x9f\xdew\xba\xa7\xeeu\x7f\x9css\xd5S\xf7S\xdf\xe1t\xfa\x9cW\xdf\x1f\xefo\xe7\r\x00\xf1\xc2\xfa\xf2J\xec?\xf8\xdbr:\x9d\x98\xcf\xe7 \x89F\xa3\x01\x9f\xcf\xa7\xb9\x97z\x86\xdb\xedf\xadV\xe3\xf5z\xa5\x88\xa8#\x95Ji\xea\x7f\xd73+\x8f\xc7\x83n\xb7\x0b\x8f\xc7\xa3\xa7\xed\xbb\xd2\xbc\xa4\x85B\x01\x93\xc9D\xc5\xb6\xdb-\x9a\xcd\xe6\xbf\x01\x8b\xc5"\xb2\xd9,L&\x13\x00\xa0R\xa9\xc0\xef\xf7\xa3\xdf\xef\xeb\x06\x01\r\xeb\xbeZ\xad\xd4\xbd\x8aF\xa34\x18\x0c\x04\xc0n\xb7\xab{\x0f5\x81n\xb7\x9b\x9dN\x87\xe5rY\xc5\xd2\xe94O\xa7\x13E\x84\x8b\xc5\x82\x16\x8b\xe5y \x00Z\xadV\x15\x0b\x06\x83<\x1e\x8f\x14\x11\xee\xf7{\xc6b1=\']_,l6\x1b{\xbd\x9e\xba\x94\xd3\xe9TW\xbf.\xf0GLDx:\x9d\xd8j\xb5h\xb7\xdb\x9f\x0b*\x8a\xf2\x80\xfdM\xf05\xe70\x1c\x0e#\x12\x89\xa8\xcf\xe7\xf3\x19\xe3\xf1\x18\x87\xc3\x01\x00\xe0p8\xb4~J\xdb\x0c\xe3\xf18E\x84\x87\xc3\x81\x1f\x1f\x1f\x0c\x85B\x04\xc0v\xbbM\x11\xe1r\xb9\xd4\xf4\x1d\xcdW\xdbp8D"\x91\xc0v\xbb\xc5`0\x00\x00\x18\x8dF\xf52\xd8\xedv\xcf\x99\xa1\xd9lf2\x99\xfc\xe9\xbbR\xa9\xf4\xfc\xe0g2\x19\xce\xe7\xf3\x87S\x18\x08\x04\xb8^\xaf)"\xbc^\xafz\xb2\xf8gPD\x98\xcb\xe5\xa8(\n\x15Ea0\x18\xe4f\xb3\xa1\x88\xf0v\xbb1\x9f\xcf?/\x87\xe9t\x9a\xf7\xfb\xfd\x97q\x18\x8dFz0m9\x9c\xcdf\x0f\xe8\xe5ra\xbd^\xa7\xd7\xeb}>\x08\x80\xd5jU\xc5\xea\xf5:].\x97^\x88\x00\xf8\xf6M}U}\x9e\xbf\xb6O\x03~\x05\xc2\x8d\x93\xdd\xaf[*\xb8\x00\x00\x00\x00IEND\xaeB`\x82'

You don’t call the method. ipython calls it automatically for you: https://ipython.readthedocs.io/en/stable/config/integrating.html as you can see in cell #9 in your nb.

1 Like

got it, thanks a lot!

Since yesterday I have kept running to this same problem, I have removed all branches and then update master with origin, and then create an entirely new branch and PR again, a few times, still have the same problem.

I have tried to delete local repo and re-download my fork and create a new branch and PR again, but still the same error.

I changed a punctuation in basic_data.ipynb and made a PR, and still got the same error.

I ran out of ways to resolve it, so I go for the suggested direction for resolving.

According to the info above, should I registered a Google Cloud account and adjust some of its settings to solve this problem?

I have registered with google cloud but how should I adjust the setting between it and repo?

Thanks a lot! @sgugger @stas

It’s not your fault, this is part of a process to include Google cloud builds in the future but it’s not working yet. Disregard that failing test.

1 Like

Thanks! so does it mean I can still do PR and get merged even though I have this same Build test failure?