Developer chat

sgugger · September 18, 2018, 11:56am

Yes, that’s better. 007b takes in itself 4-5 hours to run on a p3!

sgugger · September 19, 2018, 7:44pm

Pushed a few commits here and there to refactor a lot of the NLP stuff.
The idea is to have the data loaded and a learner in just a few lines of code, like in CV.

ashaw · September 19, 2018, 8:32pm

Merged docstrings branch and just added another PR here.=
Preview of core.py - (this example will not be checked in)

Summary:
Reformatted function/class/enum definition.
Trying to provide links where possible - inside docstrings, subclasses
Show global variables in documentation notebooks FileLike = Union[str, Path]

Next:work on making sure links go to correct places and formatting the html

sgugger · September 20, 2018, 3:53pm

Fixed a bug in yesterday’s implementation of separating batchnorm layers for weight decay in this commit.
There is now a flag bn_wd in Learner which, if set to False, will prevent weight decay from being applied to batchnorm layers during training.

sgugger · September 20, 2018, 5:50pm

In this commit, created an ImageBBox object to get data augmentation working with bounding boxes.
Hunder the hood, it’s just a square mask and when we need to pull the data at the end, we take the min/max of the coordinates of non-zero elements.

jeremy · September 20, 2018, 11:22pm

Got a bit behind on updates. Here we go:

Add titles to show_image
Split a Collection randomly
Add test sets
data_from_imagefolder creates a DataBunch for you from a folder of folders of folders!
Changed transform defaults to be more useful
TTA
Only include image extensions in get_image_files
You can override this with check_ext=False
Various metrics and loss functions now flatten their params
Figure out how many CPUs to use
Added a model meta dictionary so that models know how to cut/split themselves

stas · September 21, 2018, 3:11am

In 002_images.ipynb there is a very complex chain of ands, ors and nots (#1):

def get_image_files(c:Path, check_ext:bool=True)->FilePathList:
 [...]
 return [o for o in list(c.iterdir())
        if not o.name.startswith('.') and not o.is_dir()
        and (not check_ext or (o.suffix in image_extensions))]

I had a bit of a smoke coming up parsing the last line in my head.

Won’t this be more readable (#2):

        if not o.name.startswith('.') and not o.is_dir()
        and not (check_ext and o.suffix not in image_extensions)

And then it allows us to drop 2 nots (#3), but the above is fine too - it’s consistent on negating everything and there are less parenthesis:

        if not (o.name.startswith('.') or o.is_dir()
        or (check_ext and o.suffix not in image_extensions))

Too bad python doesn’t have unless

sgugger · September 21, 2018, 6:16pm

It’s type-annotation Friday!

annotated nb 009 in this commit
annotated nb 008 in this commit
annotated the 007 nbs in this commit
made a few corrections to early notebooks in this commit

sgugger · September 21, 2018, 6:46pm

Continue to clean-up with

moving 009a to x_009
solving a bug with flat_master and the latest no wd to bn layers development in 004b

jeremy · September 21, 2018, 7:45pm

Yup that looks better to me.

jeremy · September 21, 2018, 7:46pm

Also @313V has been adding type annotations and docstrings to the earlier notebooks.

sgugger · September 23, 2018, 3:24pm

Didn’t have time to post a message here yesterday, but the modules have been added in this commit
I made a few changes this morning in this commit then corrected bugs and added the all_ for each module that needs it in this commit.

Finally in this commit I added five examples notebooks to check everything was working well (dogs and cats, cifar10, imdb classification, movie lens and rossmann).

As Jeremy explained, you shouldn’t touch the dev_nb anymore (except to add prose). Bug fixes should be done in the modules directly! You should also use a pip install -e of the new library to test those notebooks, to easily have the latest version installed.

sgugger · September 23, 2018, 8:25pm

One last commit about module developments for a while. Just added mixup that allows us to get very fast results on cifar10 (6 minutes for 94% accuracy).

stas · September 24, 2018, 2:48am

After the recent rewrite of history you may get this error when running: git pull (or direct merge)

fatal: refusing to merge unrelated histories

The easiest way to fix this if you have forked fastai/fastai_v1 and you don’t have any branches that you want to keep, is to nuke your fork by following the delete option at the end of:

https://github.com/<yourusername>/fastai_v1/settings

and then forking again.

A potentially much more complex way is to (assuming you use ssh, adjust for https: urls if need be):

git clone git@github.com:YOURUSER/fastai_v1.git
cd fastai_v1
git remote add upstream git@github.com:fastai/fastai_v1.git
git fetch upstream
git checkout master
git merge upstream/master  --allow-unrelated-histories

which depending on when you synced your forked repository with fastai/fastai_v1 may create a gazillion of conflicts or not. In my case it did, so I decided re-doing the fork is the easiest option.

Once you resolved the conflicts, push back in to sync it:

git push --set-upstream origin master

If you don’t use the forked project, but a direct fastai/fastai_v1 checkout you can use -allow-unrelated-histories with git pull, or simply make a new checkout and copy your work files over to the fresh repo checkout.

jeremy · September 24, 2018, 2:54am

The initial documentation website commit has been done - we’ve simply imported the standard jekyll documentation template at this point, and are now working on filling it in with our docs. So everything inside doc/ now is from the template. Once we’ve figured out what we need we can remove some of the redundant stuff.

I originally checked in the vendor/ directory for jekyll, but changed my mind after feedback from @stas, which is why I had to rewrite history - see the previous message from @stas if that causes any problems for you (should only have a problem if you have a fork).

jeremy · September 24, 2018, 4:57am

Yay it looks like this doc template may just work nicely! I just manually popped in one page for testing, and it’s looking pretty good without even customizing anything much at all:

stas · September 24, 2018, 5:17am

Hmm, are you able to run dev_nb/002_images.ipynb?

I have to change the first cell to even find gen_doc:

-      import sys
-      sys.path.append('../docs')

to:

+      import pathlib, sys
+      path = str((pathlib.Path(".")/".."/"fastai").resolve())
+      if path not in sys.path: sys.path.insert(0, path)

there are no python libs under fastai_v1/docs, not sure how it worked…

and then once the path has been fixed it fails internally:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-2e0a20ad4bf5> in <module>()
      7 if path not in sys.path: sys.path.insert(0, str(path))
      8 sys.path
----> 9 from gen_doc.nbdoc import show_doc as sd

/mnt/disc1/fast.ai-1/br/fastai/master/fastai/gen_doc/nbdoc.py in <module>()
      3 from typing import Dict, Any, AnyStr, List, Sequence, TypeVar, Tuple, Optional, Union
      4 from .docstrings import *
----> 5 from .core import *
      6 
      7 __all__ = ['get_class_toc', 'get_fn_link', 'get_module_toc', 'show_doc', 'show_doc_from_name',

/mnt/disc1/fast.ai-1/br/fastai/master/fastai/gen_doc/core.py in <module>()
----> 1 from ..core import *
      2 import re
      3 
      4 def strip_fastai(s):  return re.sub(r'^fastai\.', '', s)
      5 

ValueError: attempted relative import beyond top-level package

sgugger · September 24, 2018, 1:52pm

Made some little moves in this commit

It made more sense to have DatasetBase and LabelDataset in data
I renamed all the data_from_* function to something more consistent like {type}_data_from_* so for instance image_data_from_folder, text_data_from_tokens or tabular_data_from_df.
Then I changed all the references to those functions in the example notebooks.

jeremy · September 24, 2018, 3:37pm

Fixed now.

stas · September 24, 2018, 3:53pm

Hmm, are you able to run dev_nb/002_images.ipynb
Fixed now.

ModuleNotFoundError: No module named 'fastai'

Are notebooks now supposed to rely on a pre-installed fastai as being discussed in the other thread?

Otherwise the following would remove such requirement:

import pathlib, sys
path = str((pathlib.Path(".")/"..").resolve())
if path not in sys.path: sys.path.insert(0, path)