Lesson 5 wiki

Lesson resources

Notes from @melissa.fabros


waterfall chart python library by @cpcsiszar


Q: What should I be able to be doing at this point in the class?
That’s a great question! You should be able to replicate everything you’ve seen so far on a different structured dataset. We’ve looked at Bluebook for Bulldozer as our canonical example of a structured dataset. A structured dataset has rows and columns where every column represents a different thing/feature. Whereas unstructured data are something like images where a single pixel isn’t mapped to specific feature or label.

You should be able to import data and apply the RFR to the dataset and get a reasonable score.
You should be able to identify what’s are the important features in the dataset after RFR analysis, as well as Identify how confident you are about the prediction

Would you be able to identify what is the relationship between an independent variable and dependent variable, not in a messy univariate way. You’ll have to be able to explain what’s driving the outcome and how it’s driving the outcome with dependency plots; you’ll likely show that you can create a partial dependence plot.

More tools to help to interpret a Random forest Model

Waterfall Plots

Waterfall plots are very useful, and while it’s native to Excel and second nature to MBA students, Python doesn’t yet have a good library for it. But we hope that you pick up creating this library and become famous for do so. If you’d like to contribute to this project,
this forum might be the place for you

Tree Interpreter

Creating a Validation Set is the Most Important Thing You Can Do When Building a Model

Introduction to Object Oriented Programming in Python

(Hat tip to @parrt and @timlee for sharing notes! Thanks! )

About the Intro to Machine Learning category
