[ML video] How is OOB_score and n_estimator are related

(satish) #1

My understanding is that if we have n_estimator is more than 1 then OOB_score is a average of all the tree .
Or is there any better or correct way to explain this .

(ecdrid) #2

I will give it a shot,
The oob score error is the average error for each z_i calculated using predictions from the trees that do not contain z_i in their respective bootstrap sample…

Probability is (2/3)
It’s straight from the wonderful curated sklearn docs…

(satish) #3

Ok to restate what you have written and associate with the Bull dozer problem

When the predication of unit sale value for 10 trees are [9000,8000,…] and so forth and the average of the 8500 and the actual value is 8400 the error will be 100 and OOB score will be 0 % .

Is my understanding correct ?

(Kieran) #4

This is a super late response but I was recently studying the ML course and I was having some questions about OOB also. Ill try my best to explain how I understand it.

When we run a Random forest we run singles tress using a random sample of columns from out data set and we do that a number of times specified by n_estimators. So if n_estimators is 10, the model will run 10 times on a random sample of columns from your data and take the mean of each prediction. Each single tree is a very poor predictor but the mean of them together is a good predictor.

After the random sampling for each tree there will always be a set of leftover columns that aren’t being used. We can run these left behind data points through the same model and get a prediction. This prediction is called the OOB_score.

In brief OOB_score is the average of the predictions of the columns not used in each tree of the random forest.