How does decreasing Random Forest samples size increase correlation?

In machine learning 1 lesson 4, jeremy says ’ By decreasing the set_rf_samples number, we are actually decreasing the power of the estimator and increasing the correlation’

However, I would think that it decreases correlation, since there is less chance of overlap between samples of different decision trees.

1 Like

I had the exact same question and haven’t found an answer online, but I also think this was a mistake and he actually meant to say: decrease the correlation (Lesson 4 of Machine Learning for Coders, minute 6:21) cc @jeremy

@mlnoob: I think this is correctly stated in the wiki of the lesson:

“If you use a smaller samples, say set_rf_samples(), you’ll overfit less […] Therefore, although you actually have less accuracy per tree/estimator, but the correlation between the trees will be also be less, and your RF model can generalize (make a prediction on new data) better”