Hello, everyone:
The Controversy
Judea Pearl and his critiques of current ML has been making the rounds lately due to his new book The Book of Why. From an interview he gave recently:
There are circles of research that continue to work on diagnosis without worrying about the causal aspects of the problem. All they want is to predict well and to diagnose well… I felt like an apostate when I developed powerful tools for prediction and diagnosis knowing already that this is merely the tip of human intelligence. If we want machines to reason about interventions (“What if we ban cigarettes?”) and introspection (“What if I had finished high school?”), we must invoke causal models. Associations are not enough—and this is a mathematical fact, not opinion.
As much as I look into what’s being done with deep learning, I see they’re all stuck there on the level of associations. Curve fitting. That sounds like sacrilege, to say that all the impressive achievements of deep learning amount to just fitting a curve to data. From the point of view of the mathematical hierarchy, no matter how skillfully you manipulate the data and what you read into the data when you manipulate it, it’s still a curve-fitting exercise, albeit complex and nontrivial.>
This a very controversial accusation, especially coming from someone whom Wikipedia describes as:
2011 winner of the ACM Turing Award, the highest distinction in computer science, “for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning”
Resources expanding on the Controversy
The best articles I’ve gathered explaining his life’s work and how it relates to DL & ML are the following: this from Michael Nielsen (the same guy with the cool interactive Universal Approximation Theorem webpage) and this from Ferenc Huszár.
Yann LeCun responded to Mr. Pearl’s comments in a Bloomberg interview.
Other resources
Stanford’s Susan Athey has published a couple of papers using Machine Learning to aid her in understanding causality. In this paper, she outlines the way she thinks ML can help.
Resources in fast.ai
The one time I saw Jeremy discussing this was in the ML course, specifically when he talked about partial dependence plot using Random Forests. (Do you remember any other instance, if so, please tell me)
What I think
I am an economist and thus my perspective comes from econometrics. There, people are obsessed with causality, to the point of studying non-important subjects to the detriment of the most pressing issues just because in the former we can say something about causality whereas in the latter is much more difficult.
The fundamental problem of causal inference is the impossibility of observing two different states for a given system. For example, when examining the effect of education on income, comparing people who went to college and people who didn’t won’t yield a causal answer because those two groups probably differ among many other dimensions beside their level of education. The ideal situation would be to study the life of each of us, alter our education level, and see how our income changes. Obviously, this is impossible (you cannot see what would my life have been had I not gone to high school). This non-observable situation is called a counterfactual.
Thus, the fundamental problem of Causal Inference is to estimate the counterfactual. For groups, the golden standard is a randomized trial (A/B testing, in techie lingua). However, randomized trials are not always available, and thus counterfactuals have to be estimated as any other quantity. If ML is the best tool we have to predict almost anything, is it the best tool we have to predict counterfactuals and do causal inference?
I do not know. The problem of confounders seems unassailable. And yet…
What do you think?
I am very interested to try understand the topic and there’s no better way to do so than discussing it with the fast.ai community. What’s your take on all this?