# Random Forests and Collinearity/Correlation

I was listening to a podcast the other day about ensemble techniques and the host said that random forests can, sometimes, not perform as expected when the weak-learners (the individual models) are correlated or the features exhibit collinearity. The reasoning being that in a voting scheme, correlated models skew the outcome in their favor.

I was wondering if, in practice, that is often a concern that needs to be addressed. If so, what are so good approaches to mitigate? And if not, why? Using the tools weâve learned previously, I was thinking that performing Ridge Regression (or elastic net with more weight toward Ridge) would be a decent start.

Also, if Iâve messed up any terminology, please correct me!

@fryanpan I was reading about parameter tuning in random forest and the collinearity problem can be somewhat reduced by specifically controlling âmax_featuresâ parameter. This basically controls the number of features used in each tree to build it. If we are using all features in each of the tree youâll lose diversity across individual trees which somewhat defeats the purpose of random forest.
Please correct me if Iâm wrong.

For more details refer to this link:

4 Likes

Not sure if itâs technically sound, but I was curious to see what would happen by running elastic net before random forests on the Bulldozer example from class. I set alpha to 0.65 (~2/3, so leaning toward L1/LASSO) and it dropped the number of columns from ~52 to 21. I upped the number of estimators in the Random Forest and the score met or exceeded (maybe by only a few thousandths) what we did in class. I know itâs only a single data point, but I was surprised to see similar results from only 2/5 of the number of columns in this simple example.

take a look at the âimportancesâ returned from RF. youâll see which vars are most predictive and which are irrelevant.

1 Like

Weâll be tackling this in the next 2 lessons. But basically: what @shik1470 said