I’ve seen a lot of threads in the forum about adding new features to the fast ai library (i.e. new models, new functionality, etc.) but I personally think it would be better and more convenient for everyone to have a master thread, preferably a Wiki so everyone can edit.
The Wiki would have all the ongoing feature requests for the Fastai library including the status of each and who the task is assigned to (some of us students could potentially take on the requests). Also by having a list of open/completed requests then we would see a lot less duplicate threads asking the same things Since all of us are coming from different backgrounds/skill levels it would also help for example if the feature is already in the library but some of us just didn’t realize it or know how to use it so the wiki could help address situations like that too.
What do you guys think? cc @jeremy
Here are a few Feature Requests I pulled together from other threads…anyone can feel free to add more or edit as needed!
New Models
NASNet * completed
SENet * completed
Resnet152 and VGG19 - *completed
VGG16 - *completed
New data engineering
Before building any model, data needs to be prepared, cleaned-up, labeled, and so on. That takes a considerable amount of time and no proper solution exists for the problem of going from messy data to something to ingest into fast.ai.
Proposal: Bake Snorkel’s programmatic data engineering into fast.ai to provide an end-to-end solution from idea, to data-engineering, to model building & deployment.
Sources & ressources:
- Snorkel
- Intro to Snorkel
- Blog: Emergence of Training Data Engineering
- How Google uses Snorkel to speed up data engineering
New Functionality
Saving Model location
Enable tqdm progress meter for learn.predict & learn.TTA * completed
Add from_df which will accept a pandas df (from_csv only accepts path to csv)
Add texts_from_df
function to load text from inside a pandas df
Allow duplicate files in from_csv (i.e. Upsampling minority classes)
Performance
Weld End-to-End optimization
Compatibility
Add scikit-learn wrapper for fastai, as done for Keras, XGBoost, or Pytorch with Skorch
Recent Updates
2018-11-17: Returned to OpenCV
2018-11-08: Removed OpenCV dependency
Data Processing
Currently Size of an image is specified as an int value. Add option to specify Height x Width -
(Ramesh) - I looked into this, it’s not straight forward because of how we use size in our training process. It’s entirely possibly and library doesn’t constrain us, just that I have not found any easy ways to do it. So far for what I needed, I just increase the background on the tall image to make it a square).
Unfreeze Options
Currently unfreeze automatically unfreezes to 0. This causes long runs when working with larger sized images see (Lesson 3 In-Class Discussion). Would be good to have options on how may layers / sub-layers to unfreeze.
Jeremy suggested - you can use freeze_to() for that
But most pre-trained networks have only two layers above the finetune layer. They were both huge. Is it possible to freeze_to a sub layer or breakdown to more layers in the pre-trained network?
The caveat is we have to give more learning Rates. Might be better to give an option to specify a dictionary of layer names we want to unfreeze and learning rates for them? Thoughts / suggestions?
Learner Fit Options
- Return a History of Train and Validation loss / metrics from
learner.fit
method - Add Early Stopping with Patience (similar to Keras)
- Add Model Checkpoint - *available in fastai library, see
cycle_save_name
TTA Enhancements
- Add
predict_array_TTA
(How do we use our model against a specific image?)
Environment
- Docker image for fastai - done (image, documentation)