Training Data Engineering

marvin · December 27, 2019, 1:11pm

Before you get up & running with fast.ai, you need to do data engineering.

In the past, I have spent substantial time on building training data & feature engineering by building custom pipelines, and relative to the actual ML model, these contributed to about 90% of the codebase.

More recently, I was questioning my sanity of continuing doing so and by sheer coincident, I found a marvelous solution:

Programmatic trainign data engineering.

Essentially, now you just program the thre main tasks and plug the resulting data in your model:

Write Labeling Functions to auto-label your data
Write Transformation Functions for Data Augmentation if there is no matching one
Plug resulting data into ML model and run it

https://www.snorkel.org/get-started/

How Google is using to speed up data engineering:

gaurav31 · January 7, 2020, 11:21am

Excellent thread about data engineering services.!!!