Hyperparameter tuning - Marginal increase in the R2 scores - Does it matter in practice?

kiranh · November 8, 2018, 11:59am

Hello there,

Could someone take a look at the results i have with hyperparameter tuning and enlighten me as to why one is better than the other? The difference in the R2 results even with the tuning is very small. So i do not understand if the hyperparameter tuning has really helped. These are my results on Kaggle following the teachings in Lesson 2 of the ML course

No tuning:

m = RandomForestRegressor(n_estimators=40, n_jobs=-1, oob_score=True)
%time m.fit(X_train, y_train)
print_score(m)

CPU times: user 8min 24s, sys: 2.07 s, total: 8min 26s
Wall time: 2min 13s
[0.07812224343400345, 0.2514636970900517, 0.9873097774037343, 0.8840833341425308, 0.9089150448711641]

min_sample parameter set:

m = RandomForestRegressor(n_estimators=40, min_samples_leaf=3,
n_jobs=-1, oob_score=True)
%time m.fit(X_train, y_train)
print_score(m)

CPU times: user 7min 26s, sys: 1.44 s, total: 7min 27s
Wall time: 1min 58s
[0.1147329579771278, 0.2464413627495862, 0.9726286506279375, 0.8886673641140697, 0.9095824525848047]

As you can see, when min_sample was set to 3, the R2 of the validation set increases only marginally(From 0.8840 to 0.8886) Does this really matter?

min_sample and max_features set:

m = RandomForestRegressor(n_estimators=40, min_samples_leaf=3,max_features=0.5, n_jobs=-1, oob_score=True) 
m.fit(X_train, y_train)
print_score(m)

[0.11896196553917984, 0.24168965436442894, 0.970573670779912, 0.8929192487524767, 0.9123473646361677]

Questions

As i see, the increase in R2 scores is very marginal even with tuning. My question is, does it really matter in practice?
-To assess if hyperparameter tuning is really having any effect, do we just look at the Validation set scores?
-Does increase in R2 for validation set necessarily mean that the RMSE for that set has come down?

Kindly elaborate.

Regards,
Kiran Hegde