Developer chat

Fixed the problem in this commit then cleaned out a bit in this one.

tools/sync-outputs-version is now good to go. Please let me know whether it works on Windows.

It can now execute notebooks from CLI, besides checking/copying successful ones. see the top of the script for examples, or run with -h.

Any suggestions for a better name for this tool? Currently its name is not intuitive at all, but at the moment I’m not feeling creative so nothing comes to my mind.

We use it to copy successfully executed notebooks in jupyter to dev_nb/run/ and optionally execute those from CLI.

Thank you.

copy-to-run ? render-notebook ? run-notebook ?

Thank you for the suggestions, Jeremy.

Frankly, I find the run directory to be unintuitive to start with.

Perhaps, a better name would be snapshot? As, we are taking a snapshot of the notebook’s outputs.

Then the script could be take-snapshot? take-nb-snapshot?

Yes snapshot is much better. I’ll rename it now. And I’ll call your script take-snapshot. Very nice! :slight_smile:

1 Like

@313V is visiting us this week! :slight_smile:

Lots in this commit, including:

  • DataBunch now has a path attribute, which is copied by default to Learner, and is where stuff like models will be saved to. There’s also a new data_from_imagefolder function that creates a DataBunch for you
  • You can create a transform now with is_random=False to have it not do any randomization
  • Used this feature to create ‘semi-random TTA’, which does 8 TTA images, one for each corner of the image, for each of flip and non-flip. These are combined with whatever augmentation you have for lighting, affine, etc. This approach gives dogs v cats results up to 99.7% accuracy with rn34 224 px! (Previously around 99.3-99.4%.)
  • You can call DataBunch.holdout(is_test) to get either test set or validation set. Most prediction methods now take an is_test param
  • loss_batch now moves losses and metrics to the CPU
  • Learner now saves models inside path/‘models’
  • get_transforms now defaults to reasonable defaults for side-on photos
  • Added Learner.pred_batch for one batch and Learner.get_preds for a full set of predictions
  • show_image_batch now has an optional denorm function argument
  • Added a pytorch fbeta function

Awesome! thank you!

I did some more improvements, including an important change: now the execution doesn’t overwrite the original .ipynb - so it doesn’t interfere with git and open in jupyter notebooks. Everything happens in a tmp file.

If you have your notebooks’ data setup, and lots of resources, you can now run:

tools/take-snapshot -v -e dev_nb/0*ipynb

and then make a big fat commit with many snapshots that aren’t under git yet.

Also, I disabled the execute-all-nbs by default option:

$ tools/take-snapshot -e 
When using -e/--execute, pass the *.ipynb files to execute explicitly

reasoning that it’ll take too many resources, and perhaps it’s better to specify the files to run explicitly. Nothing stops you from running dev_nb/0*ipynb though. But of course, if you believe it should work unimpeded let me know and I will remove that sanity check.

Yes, that’s better. 007b takes in itself 4-5 hours to run on a p3!

1 Like

Pushed a few commits here and there to refactor a lot of the NLP stuff.
The idea is to have the data loaded and a learner in just a few lines of code, like in CV.

Merged docstrings branch and just added another PR here.=
Preview of core.py - (this example will not be checked in)

Summary:
Reformatted function/class/enum definition.
Trying to provide links where possible - inside docstrings, subclasses
Show global variables in documentation notebooks FileLike = Union[str, Path]

Next:work on making sure links go to correct places and formatting the html

Fixed a bug in yesterday’s implementation of separating batchnorm layers for weight decay in this commit.
There is now a flag bn_wd in Learner which, if set to False, will prevent weight decay from being applied to batchnorm layers during training.

1 Like

In this commit, created an ImageBBox object to get data augmentation working with bounding boxes.
Hunder the hood, it’s just a square mask and when we need to pull the data at the end, we take the min/max of the coordinates of non-zero elements.

1 Like

Got a bit behind on updates. Here we go:

1 Like

In 002_images.ipynb there is a very complex chain of ands, ors and nots (#1):

def get_image_files(c:Path, check_ext:bool=True)->FilePathList:
 [...]
 return [o for o in list(c.iterdir())
        if not o.name.startswith('.') and not o.is_dir()
        and (not check_ext or (o.suffix in image_extensions))]

I had a bit of a smoke coming up parsing the last line in my head.

Won’t this be more readable (#2):

        if not o.name.startswith('.') and not o.is_dir()
        and not (check_ext and o.suffix not in image_extensions)

And then it allows us to drop 2 nots (#3), but the above is fine too - it’s consistent on negating everything and there are less parenthesis:

        if not (o.name.startswith('.') or o.is_dir()
        or (check_ext and o.suffix not in image_extensions))

Too bad python doesn’t have unless :slight_smile:

It’s type-annotation Friday!

1 Like

Continue to clean-up with

Yup that looks better to me.

1 Like

Also @313V has been adding type annotations and docstrings to the earlier notebooks.

Didn’t have time to post a message here yesterday, but the modules have been added in this commit
I made a few changes this morning in this commit then corrected bugs and added the all_ for each module that needs it in this commit.

Finally in this commit I added five examples notebooks to check everything was working well (dogs and cats, cifar10, imdb classification, movie lens and rossmann).

As Jeremy explained, you shouldn’t touch the dev_nb anymore (except to add prose). Bug fixes should be done in the modules directly! You should also use a pip install -e of the new library to test those notebooks, to easily have the latest version installed.

One last commit about module developments for a while. Just added mixup that allows us to get very fast results on cifar10 (6 minutes for 94% accuracy).

2 Likes