Training Data Engineering

Before you get up & running with, you need to do data engineering.

In the past, I have spent substantial time on building training data & feature engineering by building custom pipelines, and relative to the actual ML model, these contributed to about 90% of the codebase.

More recently, I was questioning my sanity of continuing doing so and by sheer coincident, I found a marvelous solution:

Programmatic trainign data engineering.

Essentially, now you just program the thre main tasks and plug the resulting data in your model:

  1. Write Labeling Functions to auto-label your data
  2. Write Transformation Functions for Data Augmentation if there is no matching one
  3. Plug resulting data into ML model and run it

How Google is using to speed up data engineering:


Excellent thread about data engineering services.!!!