Hello,
After training a first simple model without any feature engineering on the Red Wine Dataset, I get a pretty good validation r^2, but a terrible oob_score.
df_trn, y, nas = proc_df(df_raw, 'quality')
def split_vals(a,b): return a[:b].copy(), a[b:].copy()
n_valid = 30
n_trn = len(df_trn)-n_valid
X_train, X_valid = split_vals(df_trn, n_trn)
y_train, y_valid = split_vals(y, n_trn)
raw_train, raw_valid = split_vals(df_raw, n_trn)
X_train.shape, X_valid.shape, y_train.shape
m = RandomForestRegressor(n_estimators=20, n_jobs=-1, oob_score=True)
m.fit(df_trn, y)
print_score(m)
[0.22952135192153775, 0.1786523626114882, 0.9200842412403407, 0.8700226244343892, 0.4272249428368584]
What does this huge difference between r^2 and oob_score mean?