I get a really low oob score of 0.4716 and an rmse of 0.027 on the red wine quality dataset. Why is my oob_score so low? Does it mean that I am overfitting?
I have trouble knowing how to handle small sets of data. This one has 1599 records only. As soon as I create a validation set of size 30, my r^2 drops from 0.90 to 0.11! So I don’t use a validation set, instead I use oob_score_. Now r^2 is 0.90 but the oob_score_ is 0.47.
This is my model without a validation set:
m = RandomForestRegressor(n_estimators=80, max_features=0.5, n_jobs=-1, oob_score=True) m.fit(df_trn, y) print_score(m)
This is my model with validation set:
m = RandomForestRegressor(n_estimators=80, max_features=0.5, n_jobs=-1, oob_score=True) m.fit(X_train, y_train) print_score(m) [0.0393104889825399, 0.08282278268007248, 0.9278235796906971, 0.11669103577122453, 0.4711981193537419]
Could someone tell me how to interpret those big differences in result and tell me what errors I do wrong?