Before you get up & running with fast.ai, you need to do data engineering.
In the past, I have spent substantial time on building training data & feature engineering by building custom pipelines, and relative to the actual ML model, these contributed to about 90% of the codebase.
More recently, I was questioning my sanity of continuing doing so and by sheer coincident, I found a marvelous solution:
Programmatic trainign data engineering.
Essentially, now you just program the thre main tasks and plug the resulting data in your model:
- Write Labeling Functions to auto-label your data
- Write Transformation Functions for Data Augmentation if there is no matching one
- Plug resulting data into ML model and run it
https://www.snorkel.org/get-started/
How Google is using to speed up data engineering: