Lesson 5 wiki

(Jeremy Howard (Admin)) #1

This is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better! To edit, click on the little pencil icon at the bottom of this post. Here’s a pic of what to look for:

<<< Wiki: Lesson 4Wiki: Lesson 6 >>>

Lesson resources


Notes from @melissa.fabros

Resources:

waterfall chart python library by @cpcsiszar

Questions

Q: What should I be able to be doing at this point in the class?
That’s a great question! You should be able to replicate everything you’ve seen so far on a different structured dataset. We’ve looked at Bluebook for Bulldozer as our canonical example of a structured dataset. A structured dataset has rows and columns where every column represents a different thing/feature. Whereas unstructured data are something like images where a single pixel isn’t mapped to specific feature or label.

You should be able to import data and apply the RFR to the dataset and get a reasonable score.
You should be able to identify what’s are the important features in the dataset after RFR analysis, as well as Identify how confident you are about the prediction

Would you be able to identify what is the relationship between an independent variable and dependent variable, not in a messy univariate way. You’ll have to be able to explain what’s driving the outcome and how it’s driving the outcome with dependency plots; you’ll likely show that you can create a partial dependence plot.

More tools to help to interpret a Random forest Model

Waterfall Plots

Waterfall plots are very useful, and while it’s native to Excel and second nature to MBA students, Python doesn’t yet have a good library for it. But we hope that you pick up creating this library and become famous for do so. If you’d like to contribute to this project,
this forum might be the place for you

Tree Interpreter

Creating a Validation Set is the Most Important Thing You Can Do When Building a Model

Introduction to Object Oriented Programming in Python

(Hat tip to @parrt and @timlee for sharing notes! Thanks! )

0 Likes

Wiki / Lesson Thread: Lesson 6
About the Intro to Machine Learning category
(melissa.fabros) #3

Hi @jeremy,

Is it still possible to edit the lesson 6 wiki? Or should I just update notes in the reply?

Thanks!
Melissa

0 Likes

(Jeremy Howard (Admin)) #4

Looks like I’d forgotten to make it a wiki @melissa.fabros! Fixed now.

0 Likes

(Tom Artiom Fiodrov) #5

@jeremy I have been involved in fitting and interpreting RF’s for 2 years and I absolutely love this course: it talks about problems I have encountered myself and I love learning your approaches. One small thing: the treeintrepreter method has been shown to be inconsistent in the following paper: https://arxiv.org/abs/1802.03888. This means that features that are actually unimportant can somehow end up being important and vice versa. I would recommend educating about https://github.com/slundberg/shap library instead that doesn’t have such property.

0 Likes

(Jeremy Howard (Admin)) #6

Thanks @afiodorov. Yeah we have a lengthy diatribe about that issue:

However it’s a bit more nuanced than the level I was looking to go in this course - and in practice treeinterpreter works OK (and there are other approaches I prefer to fixing it other than SHAP).

1 Like