Question about Grid search


#1

I am having a strange issue with “Grid Search”.

Jeremy touched upon it briefly during Class 2. I am trying to find the best “number of trees” for the restaurant dataset. Here’s the code snippet:

n_param_list = {‘n_estimators’:[10,20,30,40,50,60]}
m = RandomForestRegressor(n_jobs=-1, oob_score=True)
grid_search = GridSearchCV(estimator = m, param_grid = n_param_list)
grid_search.fit(X_train, y_train)
print(grid_search.best_params_)

But strangely every time I run the output from best_params_ is different. Sometimes it tells 20, 2nd run says 30, then the next says 20 again. So, is there a better way to use Grid search?


(kanishkd4@gmail.com) #2

Both the GridSearchCV and RandomForestRegressor objects have a random_state parameter. You can set them to any arbitrary integer and should get consistent results on running it again with the same random_state


#3

Great!

Now the results are consistent. Though I noticed that GridSearchCV doesn’t have any random_state parameter. So, using it grid_search ends with error:
random_state is not a parameter. So when I run it I get error:

init() got an unexpected keyword argument ‘random_state’

So, the code has to be:

n_param_list = {‘n_estimators’:[10,20,30,40,50,60]}
m = RandomForestRegressor(n_jobs=-1, oob_score=True, random_state = 1)
grid_search = GridSearchCV(estimator = m, param_grid = n_param_list)
grid_search.fit(X_train, y_train)
print(grid_search.best_params_)

This begs the question what does random_state actually do? Especially the significance when it comes to GridSearch.


(Masaki Kozuki) #4

Random state is random number generator so it affects training of each model e.g. choices of columns. I don’t know how random state affects grid search. Apology.

This is copied from scikit learn document

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random


(kanishkd4@gmail.com) #5

You’re right about GridSearchCV - it shouldn’t need a random_state(my bad) so it doesn’t have any significance there.

For cross validation it will have a significance for RandomSearchCV where it randomly searches the hyperparameter space.