@Eva
My experience tells me the following
Try hard and ask for help and keep up, and you will see how friendly and supportive this place is, and your worry will be gone.
@Eva
My experience tells me the following
Try hard and ask for help and keep up, and you will see how friendly and supportive this place is, and your worry will be gone.
Thanks @sgugger
I will make them a PR about those tiny changes.
Just feel proud to contribute to the best deep learning library and organization!
@sgugger Thank you for all the work done on the fastai library and documentation. However, I personnaly think that there is much more than just examples that are missing. I have been struggeling for quite some time to understand arguments in several basic functions and I would gladly help to enrich the doc once I get a better understanding of it.
The most essential part that I find missing is a clear description of each parameters and not just its type and default value. Without this information I am for example left hanging when just trying to specify a validation folder working on the first lesson with another dataset that has a definite train and test set that are seperated into 2 folders and are both labeled on their filenames.
The takeaway of my message is really, not only examples, but also parameters description.
Thanks again for all the hard work !
I ran into the same issue! It is hard to get a clear description of the functions when you want to use your own dataset or work on kaggle competitions.
Let’s get to work!
maybe sth like this can be of interest in the future: https://developers.google.com/season-of-docs/docs/
items
. create_func
will default to open_image
" in ImageList
I have tried to read source code of ItemList
, ImageList
, and their from_folder
to figure out how ImageList.from_folder
work.
I can use pdb
to walk through the flow of codes, but I can’t find the exact step for turning image file path object into Image object, see below for comparison
So, I go to check on the docs, the second sentence makes perfect sense to explain the missing puzzle I encountered above:
Create a
ItemList
inpath
from filenames initems
.create_func
will default toopen_image
.
However, I could not locate the place where items.create_func
is set to open_image
, in fact the items.create_func
seem not exist
So, could you show me exact where in the source code items.create_func
is set to open_image
?
Thanks!
First of all, I found the exact codes for turning Path
object into Image
object below
Second, there is no such thing called items.create_func
. So, I would like to rewrite the sentence as follows
It inherits from
ItemList
and overwriteItemList.get
to callopen_image
in order to turn an image file inPath
object into anImage
object.
untar_data
[source][test]
untar_data
(url
:str
,fname
:PathOrStr
=None
,dest
:PathOrStr
=None
,data
=True
,force_download
=False
) →Path
Download url
to fname
if it doesn’t exist, and un-tgz to folder dest
.
it
above in its semantic context refers to fname
, but according to the source code, it
should refer to dest
, because only when not dest.exist()
returns True
, download_data
will be executed
I would like to provide the following docs for untar_data
In general,
untar_data
use aurl
to download atgz
file underfname
, and then un-tgzfname
into a folder underdest
.
After initial download, if running
untar_data
again withforce_download=True
or the tgz file underfname
is corrupted somehow, then existingfname
anddest
will be removed and start to download again.
After initial downloading, if
dest
does not exist, meaning no folder underdest
exist (the folder could be removed or renamed somehow), then runninguntar_data
will executedownload_data
; and if the tgz file underfname
exist, then there will be no actual downloading rather than un-tgzfname
intodest
; iffname
does not exist, then downloading for the tgz file will be actually executed.
Note: the
url
you feed tountar_data
must be one ofURLs.something
.
What do you think of this version of docs? Thanks
@stas @sgugger
Yes it seems nice. Please specify in a warning it’s only intended to be used with urls that come in URLs.something
.
Hi @stas
could you help me check on this doc improvement when you have time?
see ImageList: the problem, ImageList: proposed improvement
Thanks!
My guess is that you’re after these instructions:
https://docs.fast.ai/gen_doc_main.html#updating-an-existing-functionclass
If not, please, be more specific of what you need help with - as in - I’m trying to do foo, I did bar, and I can’t figure out how to get tar…
I have tried to improve on the docs of get_files
, by adding the following two paragraphs
To to more precise, this function returns list of FilePath objects using files in
path
that must have a suffix inextensions
, and hidden folders and files are ignored. Ifrecurse=True
, all files in subfolders will be applied;include
is used to select particular folders to apply.
Inside
get_files
, there is_get_files
which turns all filenames insidef
from directoryparent/p
into a list of FilePath objects. All filenames must have a suffix inextensions
. All hidden files are ignored.
Do they make sense? or do they make the use of get_files
easier to understand?
Also I want to provide code examples to show how to use get_files
. Can I add code examples in the following style? or what is the recommended style of providing small code examples?
import fastai.vision as fv
path_data = fv.untar_data(fv.URLs.MNIST_TINY)
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny')
path_data.ls()
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/valid'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/models'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train')]
list_FilePath_noRecurse = fv.get_files(path_data); list_FilePath
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv')]
list_FilePath_recurse = fv.get_files(path_data, recurse=True); list_FilePath_recurse[:3]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/valid/7/9294.png')]
list_FilePath_recurse[-2:]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train/3/7263.png'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train/3/7288.png')]
list_FilePath_include = fv.get_files(path_data, recurse=True, include=['test']);
list_FilePath_include[:3]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/4605.png')]
list_FilePath_include[-3:]
# [PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/1605.png'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/2642.png'),
# PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/5071.png')]
Thanks a lot!
Also I want to provide code examples to show how to use
get_files
. Can I add code examples in the following style? or what is the recommended style of providing small code examples?
Most of time, and certainly in your case - you just code it as you’d in a normal jupyter nb, and run it and the results will be saved. e.g. look at the tutorials, such as https://docs.fast.ai/tutorial.data.html - but, of course, study the corresponding source tutorial.data..ipynb
nb and you will see how it’s done.
When the setup for an example is complex, I just paste the code and the output as a code block, but you don’t need it here.
I was participating in a Kaggle competition in which the images were available as URLs with their labels. So to download the images, I thought about using download_images
function in vision.data
. But urls had to be mapped to their labels also, as after downloading the images are names like 000001.jpg,
So I modified the function that download_images
called. Should I add it as an example under download_images
. This was the code I used. I put a $ sign at the beginning and added label before that.
# This function is same as fastai source code, but added df argument.
# As I had stored my data as (url, label) in dataframe. In downlo
def _download_image_inner(df, dest, url, i, timeout=4):
suffix = re.findall(r'\.\w+?(?=(?:\?|$))', url)
suffix = suffix[0] if len(suffix)>0 else '.jpg'
label = str(int(df.loc[df['img_url'] == url]['label']))
download_image(url, dest/f"{label}_{i:08d}{suffix}", timeout=timeout)
# Code taken from download_images. But changed the first line, to take urls from df
urls = list(df['img_url'])
dest = Path(dest)
dest.mkdir(exist_ok=True)
parallel(partial(_download_image_inner, df, dest, timeout=timeout), urls, max_workers=max_workers)
I have been experimenting on the source codes of ImageList.open
, open_image
, Image
to find out which line of source codes is responsible for printing out images (as output of line 4 and line 5 of the notebook below) in using the following codes in a Jupyter notebook
from fastai.vision import *
path_data = untar_data(URLs.PLANET_TINY); path_data.ls()
imagelistRGB = ImageList.from_folder(path_data/'train'); imagelistRGB
imagelistRGB.items[10] # output is to print out an image
imagelistRGB.open(imagelistRGB.items[10]) # output is to print out an image
However, I could find no source code in fastai
responsible for this kind of image-printing-out behavior.
And then, I found the following statement in open_image docs
As we saw, in a Jupyter Notebook, the representation of an
Image
is its underlying picture (shown to its full size).
Does this statement suggests that it is some built-in codes from jupyter that created this image printing out behavior?
If not, then could you help me identify which line of source code in fastai
is responsible for this behavior?
Thanks a lot!
It’s the representation we coded for Image
: see Image.__repr__
.
I have located the source code at https://github.com/fastai/fastai/blob/master/fastai/vision/image.py#L87
def __repr__(self): return f'{self.__class__.__name__} {tuple(self.shape)}'
which only output things like Image (3, 28, 28)
when typed imagelist[0].__repr__()
which does not print out an image.
I have found no fastai code which is responsible for printing out an image when running imagelist[0]
at Jupyter notebook. This is why I suspect maybe it is jupyter notebook itself actually print out the image from typing imagelist[0]
, but I can’t prove it. ( please see the very short notebook for the experiment I tried)
Hi @stas
when you have time, could you also have a look at the last three replies? I am really curious about the behavior of imagelistRGB.items[10]
in jupyter notebook.
Thanks a lot!
It has to do with Rich display methods in ipython:
https://ipython.readthedocs.io/en/stable/config/integrating.html
and you need to be looking at: https://github.com/fastai/fastai/blob/master/fastai/vision/image.py#L88-L89
def _repr_png_(self): return self._repr_image_format('png')
def _repr_jpeg_(self): return self._repr_image_format('jpeg')
that’s where it happens.
your notebook url is 404, you need to use https://mybinder.org/ if you want to show a persistent live notebook.