Training Data Engineering

Before you get up & running with fast.ai, you need to do data engineering.

In the past, I have spent substantial time on building training data & feature engineering by building custom pipelines, and relative to the actual ML model, these contributed to about 90% of the codebase.

More recently, I was questioning my sanity of continuing doing so and by sheer coincident, I found a marvelous solution:

Programmatic trainign data engineering.

Essentially, now you just program the thre main tasks and plug the resulting data in your model:

  1. Write Labeling Functions to auto-label your data
  2. Write Transformation Functions for Data Augmentation if there is no matching one
  3. Plug resulting data into ML model and run it

https://www.snorkel.org/get-started/

How Google is using to speed up data engineering:

4 Likes

Excellent thread about data engineering services.!!!