Documentation improvements

I see, thanks a lot! :sunny:

I just updated metrics.ipynb but still ran all docs tests via ./run_tests.sh and saw two tests failed event hough I haven’t made any changes. I wanted to point this out maybe it might be helpful.

For reproducing, do the dev install and then run ./run_tests.sh text*.

_____________________________________________________ text.ipynb::Cell 6 ______________________________________________________
Notebook cell execution failed
Cell 6: Cell execution caused an exception

Input:
data_lm.save('data_lm_export.pkl')
data_clas.save('data_clas_export.pkl')

Traceback:

---------------------------------------------------------------------------
IsADirectoryError                         Traceback (most recent call last)
<ipython-input-7-7dcc871a6781> in <module>
----> 1 data_lm.save('data_lm_export.pkl')
      2 data_clas.save('data_clas_export.pkl')

~/fastai-fork/fastai/basic_data.py in save(self, file)
    152             warn("Serializing the `DataBunch` only works when you created it using the data block API.")
    153             return
--> 154         try_save(self.label_list, self.path, file)
    155 
    156     def add_test(self, items:Iterator, label:Any=None)->None:

~/fastai-fork/fastai/torch_core.py in try_save(state, path, file)
    406 
    407 def try_save(state:Dict, path:Path=None, file:PathLikeOrBinaryStream=None):
--> 408     target = open(path/file, 'wb') if is_pathlike(file) else file
    409     try: torch.save(state, target)
    410     except OSError as e:

IsADirectoryError: [Errno 21] Is a directory: '/home/turgutluk/.fastai/data/imdb_sample/data_lm_export.pkl'

_____________________________________________________ text.ipynb::Cell 7 ______________________________________________________
Notebook cell execution failed
Cell 7: Cell execution caused an exception

Input:
data_lm = load_data(path, 'data_lm_export.pkl')
data_clas = load_data(path, 'data_clas_export.pkl', bs=16)

Traceback:

---------------------------------------------------------------------------
IsADirectoryError                         Traceback (most recent call last)
<ipython-input-8-e145eb9fb246> in <module>
----> 1 data_lm = load_data(path, 'data_lm_export.pkl')
      2 data_clas = load_data(path, 'data_clas_export.pkl', bs=16)

~/fastai-fork/fastai/basic_data.py in load_data(path, file, bs, val_bs, num_workers, dl_tfms, device, collate_fn, no_check, **kwargs)
    275     "Load a saved `DataBunch` from `path/file`. `file` can be file-like (file or buffer)"
    276     source = Path(path)/file if is_pathlike(file) else file
--> 277     ll = torch.load(source, map_location='cpu') if defaults.device == torch.device('cpu') else torch.load(source)
    278     return ll.databunch(path=path, bs=bs, val_bs=val_bs, num_workers=num_workers, dl_tfms=dl_tfms, device=device,
    279                         collate_fn=collate_fn, no_check=no_check, **kwargs)

~/.conda/envs/my_fastai/lib/python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module)
    364             (sys.version_info[0] == 3 and isinstance(f, pathlib.Path)):
    365         new_fd = True
--> 366         f = open(f, 'rb')
    367     try:
    368         return _load(f, map_location, pickle_module)

IsADirectoryError: [Errno 21] Is a directory: '/home/turgutluk/.fastai/data/imdb_sample/data_lm_export.pkl'

Oh the doc tests aren’t the ones run in the test suite, you should check with pytest or make test.
I’ll check what’s wrong with this notebook tomorrow.

1 Like

How to distinguish Collection, Collection[T_co], and Collection[int]?

We can find three of them in index_row’s doc and source

index_row [source][test]

index_row ( a : Union [ Collection [ T_co ], DataFrame , Series ], idxs : Collection [ int ]) → Any

def index_row(a:Union[Collection,pd.DataFrame,pd.Series],
                      idxs:Collection[int]) -> Any:

I didn’t find the online docs for typing is helpful for figuring out their distinctions.
Here are my guesses:

  • Collection[int] can be a list or tuple of integers;
  • Collection can be a list of any type;
  • Collection[T_co] is a different expression of Collection.

Please help me distinguish them, thanks a lot!
@sgugger @stas

Thank you for the step-by-step instructions, Stas! I just submitted my first PR ever using your guide! :grinning:

One source of uncertainty I had was that git Notes – fastai mentions

In the docs_src folder, if you made changes to the notebooks, run:

 cd docs_src
 ./run_tests.sh

You will need at least 8GB free GPU RAM to run these tests.

But based on https://docs.fast.ai/gen_doc_main.html, it seems that one can just modify the notebook (if the edit is just changing the text, which it was in my case) and commit that. Basically I found a lot of helpful information, but I’m proactively apologizing in case my best attempt at following directions still resulted in doing the wrong thing.

You’re correct, @tank13, that was ambiguous. I have modified that step to clarify that it’s only needed if you modify code cells in the doc notebooks: https://docs.fast.ai/dev/git.html#step-5-test-your-changes
I hope the instructions are more clear now. Thank you for flagging that.

And thank you for the kind words - I’m glad you found it useful!

1 Like

Hi,

Inside the Abbreviation Guide (https://docs.fast.ai/dev/abbr.html) there is no entry for underscore as prefix for internal usage.

Should this be added?

You can add it in a PR, yes. Note this is standard python practice, not just us.

1 Like

While I am writing docs for fastai, sometimes I would like to write a little doc for some frequently used functions of pytorch as well. So, I got a few questions to follow:

  1. can a dev version of pytorch work well with a dev version of fastai?
  2. (If I decide to manually udpate a few files of pytorch with my docs on them, then I wonder) how frequent does a dev version of fastai requires to update a new version of pytorch?
  3. since I am using dev version of fastai, everyday I only sync my fork and local master with the official repo master, today my fastai version is 1.0.53.dev0 which I assume to be the latest, but I don’t ever see pytorch get updated so far. Does conda update -c fastai fastai update pytorch for me? If they both can automatically update pytorch when necessary, then how often does fastai require a new version of pytorch?

Thanks! @sgugger

I don’t know for PyTorch nightly as we’re developing on the latest stable release usually (since v1.0 is out). We check now and then for breaking changes in the latest nightlies but not all the time. So I’d say to use PyTorch v1.1 with fastai master.
When you sync your repo, you’re up to date with the latest (it’s rather quiet and only bug fixes at the moment as we’re developing our own v1.1). conda update should update PyTorch to the latest stable release, I don’t think it works with the nightlies.

Thanks a lot for your reply!
So, can I say that conda update -c fastai fastai won’t update the latest pytorch stable version for me but conda update ... does?

I have checked my version of pytorch is at 1.01 post2, but the latest stable version of pytorch is 1.1. However, I tried conda update pytorch torchvision. I got the following response:

(fastai) ~ conda update pytorch torchvision
Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

How can I update my pytorch to the latest stable version for my dev version of fastai?

Thanks! @sgugger @stas

Because fastai dependencies are already satisfied, and you already have the latest fastai release, and you’re telling conda to update fastai, conda won’t do anything.

if you want to update specific packages which made new releases since you installed a particular version of fastai, you need to update them explicitly, e.g., in the case of pytorch/torchvision:

conda install -c pytorch pytorch torchvision

note, I never bother with conda update, since conda install does the same thing.

Though, you can instruct conda to update all the dependencies, so here you’d do:

conda install -c fastai -c pytorch --update-deps fastai

Or, you can update all packages in your conda environment with:

conda update -c pytorch -c fastai --update-all

(you still need to list the channels -c pytorch -c fastai in the above command, otherwise it’ll only check the default channel and whatever is listed in your ~/.condarc if anything).

1 Like

Thank you very much! Very helpful!

Just read this topic today: [Solved] Reproducibility: Where is the randomness coming in?

It mentions how to get reproducible results, but it uses a doc from dev (https://docs.fast.ai/dev/test.html#getting-reproducible-results) where it reads: set num_workers=1 (or 0) in your DataLoader/DataBunch.

From the second lesson and Hirome notes (https://github.com/hiromis/notes/blob/master/Lesson2.md), @jeremy explicit tells to set the seed in order to always get the same validation set, but doesn’t say anything for num_workers.

So, 1) is the num_workers=1 really needed ir order to get the same validation set, or is it just needed when executing the tests?
And 2) Should these be explicited in the basic_data docs, in order to instruct how to get the same dataset for training and validation when needed?

Just a note: I am still learning from the lessons while trying to contribute to the project when I see something that’s missing or confusing in the docs. I hope to contribute more later on as I get more hands on experience.
I also can’t run the tests because I still haven’t got myself a GPU so this restricts me a lot. Is it possible to run them on GCP? How much time is needed for the documentation’s tests?

Thanks!

No this is only if you want to insure you get the same random batches when training. With num_workers set to more than 1, there used to be some problems with the seeds in the various processes. I think this has been fixed now however.

Note that the basic tests (with pytest run in the fastai folder) don’t need a GPU to run.

1 Like

It’s been quite a while, I was wondering if there were any doc improvement requirements before the new course starts. I see a list of the pages that Sylvain suggests can be improved, but I’m guessing many of these pages have been updated already. thanks!

The new course will use fastai version 2, so we’ll focus on that documentation.

2 Likes

Hi @sgugger is this still ongoing? I’m not sure if this is an appropriate place to make a suggestion but I’m doing the online course and in lesson 7, Jeremy highlights that in this function: tfms = ([*rand_pad(padding=3, size=28, mode='zeros')], []) the asterisk operator is used because the rand_pad function returns 2 transforms. But when I checked the documentation here, to understand what he meant, I didn’t see anything highlighting what the function rand_pad actually returns. It wasn’t until I checked the actual code in the fast.ai library that it became clear to me. I’m wondering if adding to the documentation, a description of what each transform actually returns will be helpful to readers. If so, I’d be happy to contribute.

Like it’s said up there, we are focusing on the documentation of v2 right now. Wtill happy to take any PR that gets something better in v1 :wink: