0% found this document useful (0 votes)

51 views26 pages

Tuning A CART's Hyperparameters: Elie Kawerk

The document discusses tuning the hyperparameters of classification and regression tree (CART) and random forest models in Python. It defines hyperparameters as parameters that are set prior to training rather than learned from data, such as max_depth and min_samples_leaf for CART. It describes grid search and cross-validation as common approaches for tuning hyperparameters to find an optimal model with the best performance on held-out data based on a score such as accuracy or R2. An example tunes CART hyperparameters using grid search cross-validation in scikit-learn.

Uploaded by

manish wadhwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views26 pages

Tuning A CART's Hyperparameters: Elie Kawerk

Uploaded by

manish wadhwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Tuning a CART's

hyperparameters
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON

Elie Kawerk
Data Scientist
Hyperparameters
Machine learning model:

parameters: learned from data

CART example: split-point of a node, split-feature of a node, ...

hyperparameters: not learned from data, set prior to training

CART example: max_depth , min_samples_leaf , splitting criterion ...

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

What is hyperparameter tuning?
Problem: search for a set of optimal hyperparameters for a learning algorithm.

Solution: nd a set of optimal hyperparameters that results in an optimal model.

Optimal model: yields an optimal score.

Score: in sklearn defaults to accuracy (classi cation) and R2 (regression).

Cross validation is used to estimate the generalization performance.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Why tune hyperparameters?
In sklearn , a model's default hyperparameters are not optimal for all problems.

Hyperparameters should be tuned to obtain the best model performance.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Approaches to hyperparameter tuning
Grid Search

Random Search

Bayesian Optimization

Genetic Algorithms

....

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Grid search cross validation
Manually set a grid of discrete hyperparameter values.

Set a metric for scoring model performance.

Search exhaustively through the grid.

For each set of hyperparameters, evaluate each model's CV score.

The optimal hyperparameters are those of the model achieving the best CV score.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Grid search cross validation: example
Hyperparameters grids:
max_depth = {2,3,4},

min_samples_leaf = {0.05, 0.1}

hyperparameter space = { (2,0.05) , (2,0.1) , (3,0.05), ... }

CV scores = { score(2,0.05) , ... }

optimal hyperparameters = set of hyperparameters corresponding to the best CV score.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Inspecting the hyperparameters of a CART in
sklearn
# Import DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier

# Set seed to 1 for reproducibility

SEED = 1

# Instantiate a DecisionTreeClassifier 'dt'

dt = DecisionTreeClassifier(random_state=SEED)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Inspecting the hyperparameters of a CART in
sklearn
# Print out 'dt's hyperparameters {'class_weight': None,
print(dt.get_params()) 'criterion': 'gini',
'max_depth': None,
'max_features': None,
'max_leaf_nodes': None,
'min_impurity_decrease': 0.0,
'min_impurity_split': None,
'min_samples_leaf': 1,
'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0,
'presort': False,
'random_state': 1,
'splitter': 'best'}

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Import GridSearchCV
from sklearn.model_selection import GridSearchCV

# Define the grid of hyperparameters 'params_dt'

params_dt = {
'max_depth': [3, 4,5, 6],
'min_samples_leaf': [0.04, 0.06, 0.08],
'max_features': [0.2, 0.4,0.6, 0.8]
}

# Instantiate a 10-fold CV grid search object 'grid_dt'

grid_dt = GridSearchCV(estimator=dt,
param_grid=params_dt,
scoring='accuracy',
cv=10,
n_jobs=-1)

# Fit 'grid_dt' to the training data

grid_dt.fit(X_train, y_train)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Extracting the best hyperparameters
# Extract best hyperparameters from 'grid_dt'
best_hyperparams = grid_dt.best_params_
print('Best hyerparameters:\n', best_hyperparams)

Best hyerparameters:
{'max_depth': 3, 'max_features': 0.4, 'min_samples_leaf': 0.06}

# Extract best CV score from 'grid_dt'

best_CV_score = grid_dt.best_score_
print('Best CV accuracy'.format(best_CV_score))

Best CV accuracy: 0.938

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Extracting the best estimator
# Extract best model from 'grid_dt'
best_model = grid_dt.best_estimator_

# Evaluate test set accuracy

test_acc = best_model.score(X_test,y_test)

# Print test set accuracy

print("Test set accuracy of best model: {:.3f}".format(test_acc))

Test set accuracy of best model: 0.947

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON
Tuning an RF's
Hyperparameters
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON

Elie Kawerk
Data Scientist
Random Forests Hyperparameters
CART hyperparameters

number of estimators

bootstrap

....

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Tuning is expensive
Hyperparameter tuning:

computationally expensive,

sometimes leads to very slight improvement,

Weight the impact of tuning on the whole project.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Inspecting RF Hyperparameters in sklearn
# Import RandomForestRegressor
from sklearn.ensemble import RandomForestRegressor

# Set seed for reproducibility

SEED = 1

# Instantiate a random forests regressor 'rf'

rf = RandomForestRegressor(random_state= SEED)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Inspect rf' s hyperparameters {'bootstrap': True,
rf.get_params() 'criterion': 'mse',
'max_depth': None,
'max_features': 'auto',
'max_leaf_nodes': None,
'min_impurity_decrease': 0.0,
'min_impurity_split': None,
'min_samples_leaf': 1,
'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0,
'n_estimators': 10,
'n_jobs': -1,
'oob_score': False,
'random_state': 1,
'verbose': 0,
'warm_start': False}

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Basic imports
from sklearn.metrics import mean_squared_error as MSE
from sklearn.model_selection import GridSearchCV

# Define a grid of hyperparameter 'params_rf'

params_rf = {
'n_estimators': [300, 400, 500],
'max_depth': [4, 6, 8],
'min_samples_leaf': [0.1, 0.2],
'max_features': ['log2', 'sqrt']
}

# Instantiate 'grid_rf'
grid_rf = GridSearchCV(estimator=rf,
param_grid=params_rf,
cv=3,
scoring='neg_mean_squared_error',
verbose=1,
n_jobs=-1)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Searching for the best hyperparameters
# Fit 'grid_rf' to the training set
grid_rf.fit(X_train, y_train)

Fitting 3 folds for each of 36 candidates, totalling 108 fits

[Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 10.0s
[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed: 24.3s finished
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=4,
max_features='log2', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=0.1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=1,
oob_score=False, random_state=1, verbose=0, warm_start=False)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Extracting the best hyperparameters
# Extract best hyperparameters from 'grid_rf'
best_hyperparams = grid_rf.best_params_

print('Best hyerparameters:\n', best_hyperparams)

Best hyerparameters:
{'max_depth': 4,
'max_features': 'log2',
'min_samples_leaf': 0.1,
'n_estimators': 400}

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Evaluating the best model performance
# Extract best model from 'grid_rf'
best_model = grid_rf.best_estimator_
# Predict the test set labels
y_pred = best_model.predict(X_test)
# Evaluate the test set RMSE
rmse_test = MSE(y_test, y_pred)**(1/2)
# Print the test set RMSE
print('Test set RMSE of rf: {:.2f}'.format(rmse_test))

Test set RMSE of rf: 3.89

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Let's practice!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON
Congratulations!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON

Elie Kawerk
Data Scientist
How far you have come
Chapter 1: Decision-Tree Learning

Chapter 2: Generalization Error, Cross-Validation, Ensembling

Chapter 3: Bagging and Random Forests

Chapter 4: AdaBoost and Gradient-Boosting

Chapter 5: Model Tuning

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Thank you!
MA CH IN E LEA RN IN G W ITH TREE-BA S ED MODELS IN P YTH ON

SY3102 Three Phase RSM Manual
No ratings yet
SY3102 Three Phase RSM Manual
45 pages
Fischer Catalog
No ratings yet
Fischer Catalog
60 pages
Assign 1
No ratings yet
Assign 1
4 pages
Slides (A19 A20)
No ratings yet
Slides (A19 A20)
261 pages
A Technical Seminar Report Submitted On
No ratings yet
A Technical Seminar Report Submitted On
23 pages
Exercise Random Forests
No ratings yet
Exercise Random Forests
2 pages
Class 10 Science Chapter 1 Previous Year Questions - Chemical Reactions and Equations
No ratings yet
Class 10 Science Chapter 1 Previous Year Questions - Chemical Reactions and Equations
3 pages
Economic Instructor Manual
100% (2)
Economic Instructor Manual
30 pages
Thermos
No ratings yet
Thermos
41 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
2 pages
Untitled 57
No ratings yet
Untitled 57
4 pages
SAT Subject Math Level 1 Practice Test 3
No ratings yet
SAT Subject Math Level 1 Practice Test 3
17 pages
Falcon Zinc Metal Industries L.L.C
No ratings yet
Falcon Zinc Metal Industries L.L.C
7 pages
4p Code
No ratings yet
4p Code
3 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
ML Algorithms Cheat Sheet
No ratings yet
ML Algorithms Cheat Sheet
9 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
3 pages
Random Forest
No ratings yet
Random Forest
28 pages
Decision Tree
No ratings yet
Decision Tree
1 page
8 To 12 Jaimeen
No ratings yet
8 To 12 Jaimeen
34 pages
Chapter1 - Decision Tree For Classification
No ratings yet
Chapter1 - Decision Tree For Classification
29 pages
Chapter 2: Earth in Space
No ratings yet
Chapter 2: Earth in Space
75 pages
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
No ratings yet
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
5 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
13 pages
Hyperparameter Tuning The Random Forest in Python BOM 3 - by Will Koehrsen - Towards Data Science
No ratings yet
Hyperparameter Tuning The Random Forest in Python BOM 3 - by Will Koehrsen - Towards Data Science
15 pages
PA DA2 - Merged
No ratings yet
PA DA2 - Merged
29 pages
Chapter1 PDF
No ratings yet
Chapter1 PDF
29 pages
Micro
No ratings yet
Micro
17 pages
Exercise Underfitting and Overfitting
No ratings yet
Exercise Underfitting and Overfitting
2 pages
Regression Linaire Python Tome II
No ratings yet
Regression Linaire Python Tome II
10 pages
Hyperparameter - Tuning
No ratings yet
Hyperparameter - Tuning
3 pages
String Manipulation Instructions: REP: Repeat Instruction Prefix
No ratings yet
String Manipulation Instructions: REP: Repeat Instruction Prefix
3 pages
Reference Guide - Validation & Cross-Validation
No ratings yet
Reference Guide - Validation & Cross-Validation
7 pages
Grid Search
No ratings yet
Grid Search
48 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
Workshop 1 - PRF192 - FPTU
No ratings yet
Workshop 1 - PRF192 - FPTU
4 pages
Optimized Hyperparameters Tuning of Multi-Class Classification Algorithms
No ratings yet
Optimized Hyperparameters Tuning of Multi-Class Classification Algorithms
17 pages
Process Safety
No ratings yet
Process Safety
98 pages
Hyperparameters
No ratings yet
Hyperparameters
8 pages
Holiday Homework Science 2023
No ratings yet
Holiday Homework Science 2023
17 pages
Supple Maximizing Performance in Cs CuBiCl
No ratings yet
Supple Maximizing Performance in Cs CuBiCl
5 pages
Lec 04 05
No ratings yet
Lec 04 05
37 pages
Hyper Parameter Optimization
No ratings yet
Hyper Parameter Optimization
13 pages
Iot Based Aeroponics Agriculture Monitoring System Using Raspberry Pi
No ratings yet
Iot Based Aeroponics Agriculture Monitoring System Using Raspberry Pi
8 pages
Slidesgo Optimizing Decision Trees A Deep Dive Into Hyperparameter Tuning With Randomizedsearchcv and Gridse 20241024015612VysU
No ratings yet
Slidesgo Optimizing Decision Trees A Deep Dive Into Hyperparameter Tuning With Randomizedsearchcv and Gridse 20241024015612VysU
8 pages
ML Using Python Programs
No ratings yet
ML Using Python Programs
12 pages
Employee - Attrition - Rate - Jupyter Notebook
No ratings yet
Employee - Attrition - Rate - Jupyter Notebook
62 pages
Winding
No ratings yet
Winding
15 pages
QB 1
No ratings yet
QB 1
11 pages
File Organization in DBMS
No ratings yet
File Organization in DBMS
6 pages
Hyperparameter Tuning Is The Process of Optimizing The Model
No ratings yet
Hyperparameter Tuning Is The Process of Optimizing The Model
3 pages
Polymerization - High Cycle Valves: Gas To Flare
No ratings yet
Polymerization - High Cycle Valves: Gas To Flare
6 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Module 1.2 Lesson
No ratings yet
Module 1.2 Lesson
8 pages
Technical Data Sheet: HL-10T8-PC
No ratings yet
Technical Data Sheet: HL-10T8-PC
2 pages
Generalization Error: Elie Kawerk
No ratings yet
Generalization Error: Elie Kawerk
37 pages
Scikit Learn What Were Covering
No ratings yet
Scikit Learn What Were Covering
15 pages
Infrared Thermography in Camels
No ratings yet
Infrared Thermography in Camels
6 pages
ML Chap 5
No ratings yet
ML Chap 5
14 pages
Generalization Error: Elie Kawerk
No ratings yet
Generalization Error: Elie Kawerk
37 pages
Regression Trees Chapter2
No ratings yet
Regression Trees Chapter2
21 pages
Reflections On A Cartesian Plane
No ratings yet
Reflections On A Cartesian Plane
4 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
AIH Lab2
No ratings yet
AIH Lab2
10 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
Hyper Parameters
No ratings yet
Hyper Parameters
24 pages
IB Math Studies Project Assessment Criteria
No ratings yet
IB Math Studies Project Assessment Criteria
8 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Minispec Droplet T137159
No ratings yet
Minispec Droplet T137159
2 pages
Diesel Generator Set MTU 16V4000 DS2250 Dimension
No ratings yet
Diesel Generator Set MTU 16V4000 DS2250 Dimension
4 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
Decision Trees in Sklearn Decision Trees in Sklearn
No ratings yet
Decision Trees in Sklearn Decision Trees in Sklearn
7 pages
Updated Lecture 12 Zainab
No ratings yet
Updated Lecture 12 Zainab
17 pages
Magic Square AP PC Unit 1 Review
No ratings yet
Magic Square AP PC Unit 1 Review
5 pages
Alternating Current Short Notes
No ratings yet
Alternating Current Short Notes
4 pages
ML NEW Final Format
No ratings yet
ML NEW Final Format
37 pages
Mass:mass Problems
No ratings yet
Mass:mass Problems
2 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
7 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
Hyper Parameter Tuning
No ratings yet
Hyper Parameter Tuning
4 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
#Machinelearning: Mastering Tuning Hyperparameter
No ratings yet
#Machinelearning: Mastering Tuning Hyperparameter
7 pages
Hyperparameter Tuning For Machine Learning Models
No ratings yet
Hyperparameter Tuning For Machine Learning Models
5 pages
Import Numpy As NP Import Pandas As PD
No ratings yet
Import Numpy As NP Import Pandas As PD
7 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

Tuning A CART's Hyperparameters: Elie Kawerk

Uploaded by

Tuning A CART's Hyperparameters: Elie Kawerk

Uploaded by

Tuning a CART's

parameters: learned from data

hyperparameters: not learned from data, set prior to training

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Solution: nd a set of optimal hyperparameters that results in an optimal model.

Optimal model: yields an optimal score.

Score: in sklearn defaults to accuracy (classi cation) and R2 (regression).

Cross validation is used to estimate the generalization performance.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Hyperparameters should be tuned to obtain the best model performance.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Set a metric for scoring model performance.

Search exhaustively through the grid.

For each set of hyperparameters, evaluate each model's CV score.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

min_samples_leaf = {0.05, 0.1}

hyperparameter space = { (2,0.05) , (2,0.1) , (3,0.05), ... }

CV scores = { score(2,0.05) , ... }

optimal hyperparameters = set of hyperparameters corresponding to the best CV score.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Set seed to 1 for reproducibility

# Instantiate a DecisionTreeClassifier 'dt'

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Define the grid of hyperparameters 'params_dt'

# Instantiate a 10-fold CV grid search object 'grid_dt'

# Fit 'grid_dt' to the training data

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Extract best CV score from 'grid_dt'

Best CV accuracy: 0.938

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Evaluate test set accuracy

# Print test set accuracy

Test set accuracy of best model: 0.947

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

sometimes leads to very slight improvement,

Weight the impact of tuning on the whole project.

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Set seed for reproducibility

# Instantiate a random forests regressor 'rf'

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

# Define a grid of hyperparameter 'params_rf'

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Fitting 3 folds for each of 36 candidates, totalling 108 fits

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

print('Best hyerparameters:\n', best_hyperparams)

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Test set RMSE of rf: 3.89

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

Chapter 2: Generalization Error, Cross-Validation, Ensembling

Chapter 3: Bagging and Random Forests

Chapter 4: AdaBoost and Gradient-Boosting

Chapter 5: Model Tuning

MACHINE LEARNING WITH TREE-BASED MODELS IN PYTHON

You might also like