Partial dependence plot

umar · August 19, 2022, 9:44am

Does anyone know if pdp plot on train is better than pdp plot on test data.
Also has anyone tried making pdp plot using 3 variables instead of one variable which Jeremy has shown.?

BobMcDear · August 19, 2022, 1:14pm

Hello,

Partial dependence plots can be performed over either the training or validation set, and examples of both cases can be found. For instance, fastbook uses the validation set, whereas Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, in chapter 8.1, uses the training set. Frankly, in my experience, it ultimately does not matter which strategy you choose, and there are only a couple of important considerations. First, if the training set is large, PDP may take excessively long, in which case you can use a small chunk of it or resort to the validation set. Second, if the validation set is too small, PDP’s results would understandably be not very reliable, so the training set might be the wiser option.

PDP is generally constrained to one or two variables because visualization would need to be done in four or more dimensions if there are at least three variables (e.g., three axes for three input variables plus another axis for the target variable). Additionally, the computational complexity would quickly surge. Scikit-Learn’s documentation contains instructions on PDP with two variables.

Please let me know if you have other questions.

umar · August 19, 2022, 4:00pm

Thanks for the reply.
I was exploring ALE (accumulated local effect) plot, also, I found that if I make ALE plot and compare that with pdp plot ,some time they differ. Which one should we trust more.
Is there a way or do we have code to make ALE plot for 2 variable whether they are categorical or continues variable.

Jeremy has covered pdp plot with 2 variable in 2018 course.

Also, it makes sense that if your data size is small, you can make pdp plot with train data set only.

Regards

BobMcDear · August 19, 2022, 5:22pm

Hello,

You’re welcome. Overall, accumulated local effect is vastly more robust than PDP if there are feature correlations, which is nearly always true in real-world datasets. For example, if you have two features in an automotive dataset, one containing the vehicle’s model and the other indicating when it was built, PDP would include myriad data points that are utterly senseless, e.g., a Model S made in 1980 or a BMW E30 from 2015. Therefore, ALE should be preferred over PDP, unless you are confident that your features are not overly correlated, and unlikely/impossible cases, such as the car example, are not frequent.

However, ALE does have its shortcomings, particularly, it is not as intuitive as PDP and can thus be misinterpreted. Furthermore, since it requires an ordering of some sort for the feature of interest, it can be somewhat inaccurate for nominal variables, nonetheless, there are techniques that mostly solve that challenge. Lastly, like already mentioned, ALE is susceptible to being misinterpreted in case you do not have a firm understanding of its workings and is knottier than PDP.

Personally, I have only worked with ALE in R, but this Python package, as well as this one, seem sufficient for your goals. They support ALE for one or two continuous variable and one categorical variable, but not two categorical variables. Due to reasons mentioned in my previous post, ALE with more than two variables is rare.

In short, ALE is usually the more prudent choice.

Is that helpful?