Red Wine Quality: good validation r^2, terrible oob_score


After training a first simple model without any feature engineering on the Red Wine Dataset, I get a pretty good validation r^2, but a terrible oob_score.

df_trn, y, nas = proc_df(df_raw, 'quality')

def split_vals(a,b): return a[:b].copy(), a[b:].copy()

n_valid = 30
n_trn = len(df_trn)-n_valid
X_train, X_valid = split_vals(df_trn, n_trn)
y_train, y_valid = split_vals(y, n_trn)
raw_train, raw_valid = split_vals(df_raw, n_trn)

X_train.shape, X_valid.shape, y_train.shape

m = RandomForestRegressor(n_estimators=20, n_jobs=-1, oob_score=True), y)
[0.22952135192153775, 0.1786523626114882, 0.9200842412403407, 0.8700226244343892, 0.4272249428368584]

What does this huge difference between r^2 and oob_score mean?

What you did here is train on the complete set. The OOB score takes the left over rows and makes is own subset to make a score of.

There are probably only a few rows left now after you trained the RF. so the OOB score can vary a lot since there could be some outliers in the subset. it can not average those out since it is a super small set.

correct me if im wrong ofcourse. we are all learning