This is a forum wiki thread, so you all can edit this post to add/change/organize info to help make it better! To edit, click on the little pencil icon at the bottom of this post. Here’s a pic of what to look for:
“One of the big insights I want you to take away from this is … really what you want is data that tells you about causality not correlation. … Generally speaking, you do not have data about causality, you’ve got data about business as usual.”
Then he goes on to explain how he convinced his client to conduct randomized experiments in order to collect data about causality.
But in lesson 6 I believe Jeremy doesn’t talk about conducting randomized experiments in order to collect data about causality.
So my question is: is it the case that the invention of partial dependence plots has replaced the need for conducting randomized experiments?
@jeremy I have a question on your experience with balancing theoretical rigor and beeing practical when it comes to causal relationships.
In the lesson around 15:45 you outlined how one could use feature importance to identify actionable features and PDP to generate simulations based on those features.
While this approach is appealing due to its simplicity, it seems that it can only be used if we are sure that the feature target relationship estimated by the PDP reflects the true causal relationship of the problem.
Since modeling causal relationships in complex business settings is notoriously difficult, my question is: How can we make sure that our simulator (via PDP) is built on the right causal assumptions and thus will lead to realistic scenarios for our data product?
Thanks a lot for the link! A follow-up question on the above: If there is no theoretical guarantee about the causal link, what’s the approach you would use to build your simulator on PDP (to be reasonably confident in the simulation)? How did you approach this in your previous business projects?