Wiki / Lesson Thread: Lesson 6

melissa.fabros · November 14, 2017, 8:34pm

This is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better! To edit, click on the little pencil icon at the bottom of this post. Here’s a pic of what to look for:

<<< Wiki: Lesson 5 ｜ Wiki: Lesson 7 >>>

Lesson resources

Lesson video
Designing Great Data Products by Jeremy Howard
Lesson notes from @hiromi
Waterfall Visualization, by Chris Paulca. Install with pip install waterfallcharts

Course Notes: under construction

Random forest interpretation techniques & review

Confidence based on tree variance
Feature Importance
removing redundant features
partial dependence
Tree interpreter
Extrapolation

jeremy · November 15, 2017, 4:45am

I’ve just added the lesson video to the top post (currently uploading - will be available in ~30 mins).

djohn · November 19, 2017, 12:05am

@jeremy Could you please share the slides you showed us in this lecture? The one on ML applications in different industries.
Thanks!

jeremy · November 19, 2017, 2:52am

Thanks for the reminder. I’ve added it to git in the ‘ppt’ folder.

Antoine · September 29, 2018, 7:29am

There is something in this lesson that I would like to clarify.

In the paper written by Jeremy back in 2012 (Designing great data products), at the end there is a link to a YouTube video: Jeremy Howard - From Predictive Modelling to Optimization: The Next Frontier. Around minute 12:03, Jeremy says:

“One of the big insights I want you to take away from this is … really what you want is data that tells you about causality not correlation. … Generally speaking, you do not have data about causality, you’ve got data about business as usual.”

Then he goes on to explain how he convinced his client to conduct randomized experiments in order to collect data about causality.

But in lesson 6 I believe Jeremy doesn’t talk about conducting randomized experiments in order to collect data about causality.

So my question is: is it the case that the invention of partial dependence plots has replaced the need for conducting randomized experiments?

jeremy · September 30, 2018, 5:27am

Unfortunately not.

lhezm · January 3, 2020, 9:19am

@jeremy I have a question on your experience with balancing theoretical rigor and beeing practical when it comes to causal relationships.
In the lesson around 15:45 you outlined how one could use feature importance to identify actionable features and PDP to generate simulations based on those features.
While this approach is appealing due to its simplicity, it seems that it can only be used if we are sure that the feature target relationship estimated by the PDP reflects the true causal relationship of the problem.
Since modeling causal relationships in complex business settings is notoriously difficult, my question is: How can we make sure that our simulator (via PDP) is built on the right causal assumptions and thus will lead to realistic scenarios for our data product?

jeremy · January 4, 2020, 11:50pm

There isn’t really any way to be sure, I’m afraid. You might be interested in this book about causality analysis:

https://www.amazon.com/Book-Why-Science-Cause-Effect/dp/046509760X

lhezm · January 5, 2020, 3:58pm

There isn’t really any way to be sure, I’m afraid

Thanks a lot for the link! A follow-up question on the above: If there is no theoretical guarantee about the causal link, what’s the approach you would use to build your simulator on PDP (to be reasonably confident in the simulation)? How did you approach this in your previous business projects?

jeremy · January 5, 2020, 10:25pm

Randomized controlled trials!