This is a chat thread for fastai v1 developers. Use it like a Slack or IRC channel. You can keep it open in a separate tab or window and see comments in real time, or check it from time to time to see what’s going on. If you’re working on fastai v1 and get stuck and need help, or want to discuss possible ways to solve a design issue, etc, feel free to add a reply here.
Using these Discourse forums for real-time chat works much better if you know the keyboard shortcuts. Hit shift-/ to bring up the list of shortcuts. Most important to know are: shift-R to reply; ctrl-enter to send. If you’re discussing a line of code or a particular commit, please include a link to it. If you’re discussing some chart or image, paste it in your post.
It would be very helpful if people contributing code on a regular basis could try to use this chat thread to mention:
What you’re going to start working on, if you’re starting on a new feature/bug/issue
When you commit code, tell us some context about what you did and why, and how it might impact other devs.
009a is copied from previous lesson3_rossmann and contains all the necessary steps to feature-engineer the data
009 contains a first attempt at creating a TabularDataset, that is consistent with the way other APIs (ImageDataset and TextDataset) are built in the library.
This one is a wrapper that looks like the previous training phase API, to easily experiment various schedules.
By @jeremy’s request I have been working on a new build tool which will copy a fully completed notebook w/ outputs to dev_nb/run/ so that it can be shown to users.
It’s difficult to come up with a perfect logic to programmatically validate a notebook to be complete to be shown to users, so at the moment it uses two checks:
check the last code cell with code and see that it has outputs - but it’s possible that a cell was run yet had no outputs, so it’s not a solid check
check that the execution_count numbers are contiguous. i.e. if after completing the run of the notebook you went back up and re-run some cells, it’ll reject such notebook, as chances are it won’t be “perfect”.
Bottom line - run the notebook from the beginning to an end without any errors and then it’ll accept it.
It also pushed a disclaimer cell to the very top of the notebook to indicate that this is not be modified/PRed/bug-reported and to use the source notebook instead.
It’s still a work in progress but give it a try. You can run it on all notebooks:
Just to clarify: anything in this directory does not have its outputs stripped automatically. I’m not sure I remembered to post anything about that when I added it!
tools/sync-outputs-version is now good to go. Please let me know whether it works on Windows.
It can now execute notebooks from CLI, besides checking/copying successful ones. see the top of the script for examples, or run with -h.
Any suggestions for a better name for this tool? Currently its name is not intuitive at all, but at the moment I’m not feeling creative so nothing comes to my mind.
We use it to copy successfully executed notebooks in jupyter to dev_nb/run/ and optionally execute those from CLI.
DataBunch now has a path attribute, which is copied by default to Learner, and is where stuff like models will be saved to. There’s also a new data_from_imagefolder function that creates a DataBunch for you
You can create a transform now with is_random=False to have it not do any randomization
Used this feature to create ‘semi-random TTA’, which does 8 TTA images, one for each corner of the image, for each of flip and non-flip. These are combined with whatever augmentation you have for lighting, affine, etc. This approach gives dogs v cats results up to 99.7% accuracy with rn34 224 px! (Previously around 99.3-99.4%.)
You can call DataBunch.holdout(is_test) to get either test set or validation set. Most prediction methods now take an is_test param
loss_batch now moves losses and metrics to the CPU
Learner now saves models inside path/‘models’
get_transforms now defaults to reasonable defaults for side-on photos
Added Learner.pred_batch for one batch and Learner.get_preds for a full set of predictions
show_image_batch now has an optional denorm function argument
I did some more improvements, including an important change: now the execution doesn’t overwrite the original .ipynb - so it doesn’t interfere with git and open in jupyter notebooks. Everything happens in a tmp file.
If you have your notebooks’ data setup, and lots of resources, you can now run:
tools/take-snapshot -v -e dev_nb/0*ipynb
and then make a big fat commit with many snapshots that aren’t under git yet.
Also, I disabled the execute-all-nbs by default option:
$ tools/take-snapshot -e
When using -e/--execute, pass the *.ipynb files to execute explicitly
reasoning that it’ll take too many resources, and perhaps it’s better to specify the files to run explicitly. Nothing stops you from running dev_nb/0*ipynb though. But of course, if you believe it should work unimpeded let me know and I will remove that sanity check.