Article on Data Preparation (from theverge.com)

All I can say is amen and amen.

I see this topic is getting covered a bit in the ML course and I hope there will be some discussion for us in the DL course as well.

1 Like

I agree. Does anyone have any favorite data cleansing / wrangling tools? For textual data, I’ve always just coded something up, but I’d like to try a GUI like Google’s OpenRefine, provided what I do can be exported to a script so it’s repeatable.

For images, I guess the functionality would just be labeling. I’m unfamiliar with this area, but since deep learning is a few years old, I imagine there are some decent image labeling tools out there. Jeremy mentioned using the classifier to identify images the system is “most unsure” about, and possibly remove those. Maybe there’s room for a tool in the fast ai library that can present such images for review and possible removal or relabeling. Anything that saves time in this area would be a boon.

1 Like