Standardizing your data science project structure

Hey all, I came across a really nice resource that auto-generates a project structure for data science projects. Check it out here:

https://drivendata.github.io/cookiecutter-data-science/

It’s motivated from the excellent python cookiecutter utility. I must say, I often find myself in a constant state of wondering how to structure my projects and as I’m working through a project I spend more time than I’d like restructuring and reorganizing my setup. I’ve been using the data science cookiecutter for a couple weeks now at work and for kaggle projects at home and I’m spending much less time worrying about how I’ve organized things.

Read through the opinions section of the documentation, I found it especially illuminating! They reference a post from Mike Bostock about using make for reproducible workflows which is something I’d never seen before.

Hope this project is useful for others as well.

9 Likes

This is an excellent resource! They hit the nail on the head: data exploration is a messy, nonlinear process. It’s all too easy to create “snowflake” projects that will be indecipherable after a few months’ time away. Thanks for posting this.

1 Like

This is amazing. I always wonder how to structure my project.
Thank you very much for finding and posting this

1 Like

Do people typically use a structure like this or is there something else that tends to be standard for these types of projects?