This is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better! To edit, click on the little pencil icon at the bottom of this post. Here’s a pic of what to look for:
- Lesson video
- Lesson notes from @hiromi
- Lesson notes thanks to @timlee
- Notebooks are
Notes from @melissa.fabros
Q: What should I be able to be doing at this point in the class?
That’s a great question! You should be able to replicate everything you’ve seen so far on a different structured dataset. We’ve looked at Bluebook for Bulldozer as our canonical example of a structured dataset. A structured dataset has rows and columns where every column represents a different thing/feature. Whereas unstructured data are something like images where a single pixel isn’t mapped to specific feature or label.
You should be able to import data and apply the RFR to the dataset and get a reasonable score.
You should be able to identify what’s are the important features in the dataset after RFR analysis, as well as Identify how confident you are about the prediction
Would you be able to identify what is the relationship between an independent variable and dependent variable, not in a messy univariate way. You’ll have to be able to explain what’s driving the outcome and how it’s driving the outcome with dependency plots; you’ll likely show that you can create a partial dependence plot.
More tools to help to interpret a Random forest Model
Waterfall plots are very useful, and while it’s native to Excel and second nature to MBA students, Python doesn’t yet have a good library for it. But we hope that you pick up creating this library and become famous for do so. If you’d like to contribute to this project,
this forum might be the place for you