While going through this discussion forum, I came across a few discussions on the bootstrap
argument of RandomForestRegressor()
function and also about the set_rf_samples()
.
I also misunderstood it in the beginning and reading the conversations just got me more confused. So, I decided to dig a bit deep into fast.ai and sklearn source codes and came up with the following conclusions -->
n = no_of_rows_in_dataframe
if (bootstrap == false) {
then all `n` rows are considered exactly once per tree for training
}
else if (set_rf_samples(k) is used) {
then `k` rows are selected per tree for training & there might be some repetitions of rows
}
else {
then `n` rows are selected per tree for training & there might be some repetitions of rows
}
Also, there were some ambiguity around the oob_score calculation. So, after exploring a bit, here’s what I concluded —>
/**************************************************************************************
for simplicity assuming output corresponding to each input is a single number.
So, y.shape = (n, 1)
y = actual outputs
n = no_of_rows_in_data_frame
For cases with a output vector, the oob_score can be calculated by simply taking average of oob_score
of each column of the vector.
****************************************************************************************/
total_prediction = zero_matrix of dimension (n x 1) /* used to accumulate total predictions for each row (by different trees in the forest) which will later be averaged */
no_of_predictions = zero_matrix of dimension (n x 1) /* total number of predictions for each row (which also represents total number of trees in which each row is Out-Of-Bag), used for averaging later */
for (tree in forest) {
out_of_bag_samples = all_rows - set(rows used by `tree` for training)
total_prediction += tree.predict(out_of_bag_samples)
no_of_predictions = (increased by 1 for each row which was in out_of_bag_sample)
}
predictions = total_prediction / no_of_predictions
oob_score = r2_score(y, predictions)
For exact code of oob_score calculation, refer here