# Random forest hyperparamter tuning
```
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
x_training = x_train.to_numpy()
y_training = y_train.to_numpy().ravel()
x_validation = x_valid.to_numpy()
y_validation = y_valid.to_numpy().ravel()
m_samples = x_training.shape[0]
param_grid = [
{'bootstrap': [True], 'n_estimators': [50, 100, 150], 'max_depth': [3, 4, 5], 'min_samples_split': [2, 4, 6, 8],'max_samples': [1000, 1500, 2000]}
]
forest_reg = RandomForestClassifier()
grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
scoring='f1',
return_train_score=True)
grid_search.fit(x_training, y_training)
print(grid_search.best_params_)
print('train', grid_search.best_score_)
grid_search.fit(x_validation, y_validation)
print('validation',grid_search.best_score_)
```
Since we didn't optimize the random forest classifier, building trees cost a lot of time.
sklearn's RandomForestClassifier has optimization for every tree, letting every tree in a forest run in parallel.
Using sklearn's GridSearchCV to select the models which has the best f1 score. GridSearchCV try out every combinations in param_grid. In this example, we have 108 possibilities (n_estimators)(max_depth)(min_samples_split)*(max_samples). Also, GridSearchCV uses k-fold cross validation set, which won't waste validation set compared with hold-out validation set.
This is the best hyperparameter
