Fixed the problem in this commit then cleaned out a bit in this one.
tools/sync-outputs-version is now good to go. Please let me know whether it works on Windows.
It can now execute notebooks from CLI, besides checking/copying successful ones. see the top of the script for examples, or run with -h
.
Any suggestions for a better name for this tool? Currently its name is not intuitive at all, but at the moment I’m not feeling creative so nothing comes to my mind.
We use it to copy successfully executed notebooks in jupyter to dev_nb/run/
and optionally execute those from CLI.
Thank you.
copy-to-run
? render-notebook
? run-notebook
?
Thank you for the suggestions, Jeremy.
Frankly, I find the run
directory to be unintuitive to start with.
Perhaps, a better name would be snapshot
? As, we are taking a snapshot of the notebook’s outputs.
Then the script could be take-snapshot
? take-nb-snapshot
?
Yes snapshot
is much better. I’ll rename it now. And I’ll call your script take-snapshot
. Very nice!
@313V is visiting us this week!
Lots in this commit, including:
-
DataBunch
now has apath
attribute, which is copied by default toLearner
, and is where stuff like models will be saved to. There’s also a newdata_from_imagefolder
function that creates aDataBunch
for you - You can create a transform now with
is_random=False
to have it not do any randomization - Used this feature to create ‘semi-random TTA’, which does 8 TTA images, one for each corner of the image, for each of flip and non-flip. These are combined with whatever augmentation you have for lighting, affine, etc. This approach gives dogs v cats results up to 99.7% accuracy with rn34 224 px! (Previously around 99.3-99.4%.)
- You can call
DataBunch.holdout(is_test)
to get either test set or validation set. Most prediction methods now take anis_test
param -
loss_batch
now moves losses and metrics to the CPU -
Learner
now saves models inside path/‘models’ -
get_transforms
now defaults to reasonable defaults for side-on photos - Added
Learner.pred_batch
for one batch andLearner.get_preds
for a full set of predictions -
show_image_batch
now has an optionaldenorm
function argument - Added a pytorch
fbeta
function
Awesome! thank you!
I did some more improvements, including an important change: now the execution doesn’t overwrite the original .ipynb - so it doesn’t interfere with git and open in jupyter notebooks. Everything happens in a tmp file.
If you have your notebooks’ data setup, and lots of resources, you can now run:
tools/take-snapshot -v -e dev_nb/0*ipynb
and then make a big fat commit with many snapshots that aren’t under git yet.
Also, I disabled the execute-all-nbs by default option:
$ tools/take-snapshot -e
When using -e/--execute, pass the *.ipynb files to execute explicitly
reasoning that it’ll take too many resources, and perhaps it’s better to specify the files to run explicitly. Nothing stops you from running dev_nb/0*ipynb
though. But of course, if you believe it should work unimpeded let me know and I will remove that sanity check.
Yes, that’s better. 007b takes in itself 4-5 hours to run on a p3!
Pushed a few commits here and there to refactor a lot of the NLP stuff.
The idea is to have the data loaded and a learner in just a few lines of code, like in CV.
Merged docstrings branch and just added another PR here.=
Preview of core.py - (this example will not be checked in)
Summary:
Reformatted function/class/enum definition.
Trying to provide links where possible - inside docstrings, subclasses
Show global variables in documentation notebooks FileLike = Union[str, Path]
Next:work on making sure links go to correct places and formatting the html
Fixed a bug in yesterday’s implementation of separating batchnorm layers for weight decay in this commit.
There is now a flag bn_wd in Learner which, if set to False, will prevent weight decay from being applied to batchnorm layers during training.
In this commit, created an ImageBBox object to get data augmentation working with bounding boxes.
Hunder the hood, it’s just a square mask and when we need to pull the data at the end, we take the min/max of the coordinates of non-zero elements.
Got a bit behind on updates. Here we go:
- Add titles to show_image
-
Split a
Collection
randomly - Add test sets
-
data_from_imagefolder
creates a DataBunch for you from a folder of folders of folders! - Changed transform defaults to be more useful
- TTA
-
Only include image extensions in
get_image_files
- You can override this with
check_ext=False
- Various metrics and loss functions now flatten their params
- Figure out how many CPUs to use
- Added a model meta dictionary so that models know how to cut/split themselves
In 002_images.ipynb there is a very complex chain of and
s, or
s and not
s (#1):
def get_image_files(c:Path, check_ext:bool=True)->FilePathList:
[...]
return [o for o in list(c.iterdir())
if not o.name.startswith('.') and not o.is_dir()
and (not check_ext or (o.suffix in image_extensions))]
I had a bit of a smoke coming up parsing the last line in my head.
Won’t this be more readable (#2):
if not o.name.startswith('.') and not o.is_dir()
and not (check_ext and o.suffix not in image_extensions)
And then it allows us to drop 2 not
s (#3), but the above is fine too - it’s consistent on negating everything and there are less parenthesis:
if not (o.name.startswith('.') or o.is_dir()
or (check_ext and o.suffix not in image_extensions))
Too bad python doesn’t have unless
It’s type-annotation Friday!
- annotated nb 009 in this commit
- annotated nb 008 in this commit
- annotated the 007 nbs in this commit
- made a few corrections to early notebooks in this commit
Continue to clean-up with
- moving 009a to x_009
- solving a bug with flat_master and the latest no wd to bn layers development in 004b
Yup that looks better to me.
Didn’t have time to post a message here yesterday, but the modules have been added in this commit
I made a few changes this morning in this commit then corrected bugs and added the all_ for each module that needs it in this commit.
Finally in this commit I added five examples notebooks to check everything was working well (dogs and cats, cifar10, imdb classification, movie lens and rossmann).
As Jeremy explained, you shouldn’t touch the dev_nb anymore (except to add prose). Bug fixes should be done in the modules directly! You should also use a pip install -e of the new library to test those notebooks, to easily have the latest version installed.
One last commit about module developments for a while. Just added mixup that allows us to get very fast results on cifar10 (6 minutes for 94% accuracy).