I’ve seen a lot of threads in the forum about adding new features to the fast ai library (i.e. new models, new functionality, etc.) but I personally think it would be better and more convenient for everyone to have a master thread, preferably a Wiki so everyone can edit.
The Wiki would have all the ongoing feature requests for the Fastai library including the status of each and who the task is assigned to (some of us students could potentially take on the requests). Also by having a list of open/completed requests then we would see a lot less duplicate threads asking the same things Since all of us are coming from different backgrounds/skill levels it would also help for example if the feature is already in the library but some of us just didn’t realize it or know how to use it so the wiki could help address situations like that too.
What do you guys think? cc @jeremy
Here are a few Feature Requests I pulled together from other threads…anyone can feel free to add more or edit as needed!
New data engineering
Before building any model, data needs to be prepared, cleaned-up, labeled, and so on. That takes a considerable amount of time and no proper solution exists for the problem of going from messy data to something to ingest into fast.ai.
Proposal: Bake Snorkel’s programmatic data engineering into fast.ai to provide an end-to-end solution from idea, to data-engineering, to model building & deployment.
Sources & ressources:
- Intro to Snorkel
- Blog: Emergence of Training Data Engineering
- How Google uses Snorkel to speed up data engineering
Saving Model location
Enable tqdm progress meter for learn.predict & learn.TTA * completed
Add from_df which will accept a pandas df (from_csv only accepts path to csv)
texts_from_df function to load text from inside a pandas df
Allow duplicate files in from_csv (i.e. Upsampling minority classes)
Weld End-to-End optimization
Currently Size of an image is specified as an int value. Add option to specify Height x Width -
(Ramesh) - I looked into this, it’s not straight forward because of how we use size in our training process. It’s entirely possibly and library doesn’t constrain us, just that I have not found any easy ways to do it. So far for what I needed, I just increase the background on the tall image to make it a square).
Currently unfreeze automatically unfreezes to 0. This causes long runs when working with larger sized images see (Lesson 3 In-Class Discussion). Would be good to have options on how may layers / sub-layers to unfreeze.
Jeremy suggested - you can use freeze_to() for that
But most pre-trained networks have only two layers above the finetune layer. They were both huge. Is it possible to freeze_to a sub layer or breakdown to more layers in the pre-trained network?
The caveat is we have to give more learning Rates. Might be better to give an option to specify a dictionary of layer names we want to unfreeze and learning rates for them? Thoughts / suggestions?
Learner Fit Options
- Return a History of Train and Validation loss / metrics from
- Add Early Stopping with Patience (similar to Keras)
- Add Model Checkpoint - *available in fastai library, see
predict_array_TTA(How do we use our model against a specific image?)