0% found this document useful (0 votes)

3 views33 pages

Module 9- CART

Module 9 covers Classification and Regression Trees (CART) in machine learning, detailing decision trees, their definitions, criteria for splits, and the tree-building process. It discusses regression and classification trees, including metrics like Gini index and entropy, and emphasizes the importance of pruning to avoid overfitting. The module also highlights the pros and cons of decision trees, particularly their applications in finance.

Uploaded by

Aashir Aftab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views33 pages

Module 9- CART

Uploaded by

Aashir Aftab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Module 9

Classification and Regression Trees

CART

Prof. Pedram Jahangiry

Class Modules
• Module 1- Introduction to Machine Learning
• Module 2- Setting up Machine Learning Environment
• Module 3- Linear Regression (Econometrics approach)
• Module 4- Machine Learning Fundamentals
• Module 5- Linear Regression (Machine Learning approach)
• Module 6- Penalized Regression (Ridge, LASSO, Elastic Net)
• Module 7- Logistic Regression
• Module 8- K-Nearest Neighbors (KNN)
• Module 9- Classification and Regression Trees (CART)
• Module 10- Bagging and Boosting
• Module 11- Dimensionality Reduction (PCA)
• Module 12- Clustering (KMeans – Hierarchical)

Prof. Pedram Jahangiry

Road map ML Algorithm

Supervised Unsupervised

Dimensionality
Regression Classification Clustering
Reduction

Linear / Logistic Principle K-Mean

Polynomial regression Component
Penalized Analysis
regression (PCA)
KNN KNN Hierarchical

SVR SVM SVC

1. Decision Trees (DTs)

Tree-based Tree-based
Regression models Classification models 2. Bagging, Random Forest
3. Boosting

Prof. Pedram Jahangiry

Topics
Part I
1. Decision Trees Definitions Part III
2. Decision Trees criteria 1. Pruning a tree
• MSE
2. Hyperparameters
• Error Rate
• Gini Index
• Entropy
Part IV
Part II 1. Pros and Cons
1. Regression Trees 2. Applications in Finance
2. Classification Trees

Prof. Pedram Jahangiry

Part I
Decision Trees definitions and criteria

Prof. Pedram Jahangiry

Decision Trees Definitions
• DTs are ML algorithms that progressively divide data sets into smaller data groups based on a
descriptive feature, until they reach sets that are small enough to be described by some label.
• DTs apply a top-down approach to data, trying to group and label observations that are similar.
Weight

Height> 6
𝑅2
180 No Yes

Weight >
100 180
Male 𝑅1
𝑅3 𝑅1 No Yes

4 5 6 Height 𝑅3 Female Male 𝑅2

Prof. Pedram Jahangiry
Decision Trees Definitions
• When the target variable consists of real numbers: regression trees
• When the target variable is categorical: classification trees
• Terminology:

✓ Root node
✓ Splitting
✓ Branch
✓ Decision node (internal node)
✓ Leaf node (terminal node)
✓ Sub-tree
✓ Depth (level)
✓ Pruning

Prof. Pedram Jahangiry

Decision Trees Criteria
• Which split adds the most information gain (minimum impurity)?
• Regression trees: MSE
• Classification trees:
Control how a Decision Tree
1. Error rate decides to split the data
𝑋2
2. Entropy
3. Gini Index

They all
measure
impurity
𝑋1

Prof. Pedram Jahangiry

Decision Trees Criteria
• Entropy: Measures the impurity or randomness (uncertainty) in the data points
• Gini Index: Measure how often a randomly chosen element would be incorrectly labeled
• For both Entropy and Gini, 0 expresses all the elements
belong to a specified class (pure)
• Different decision tree algorithms utilize different
impurity metrics

𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = − ෍ 𝑝𝑗 log 2 (𝑝𝑗 )

𝑗

𝐺𝑖𝑛𝑖 = 1 − ෍ 𝑝𝑗2
𝑗

Prof. Pedram Jahangiry

Part II
Regression / Classification Trees!
How does a decision tree work?

Prof. Pedram Jahangiry

Regression Trees
• Baseball Salary is color-coded from low (blue, green) to high (yellow, red)

• DTs apply a top-down approach to data, trying to

group and label observations that are similar.

• The main questions in every decision-making process:

1. Which feature to start with?
2. Where to put the split (cut off)?

Prof. Pedram Jahangiry

Interpreting the results
• Based on color-coded salary, it seems that years is the most
important factor in determining salary.
• For less experienced players, the number of hits seems
irrelevant.
• Among more experienced players thought, players with more
hits tend to have higher salaries.
• As one can see, the model is very easy to display, interpret
and explain.

Prof. Pedram Jahangiry

Tree building process
• Divide the feature space into J distinct and non-overlapping
regions.
• For every observation that falls into the region 𝑅𝑗 , we make
the same prediction, which is simply the mean of the target
values for the training observations in 𝑅𝑗 .
• The goal is to find rectangles 𝑅1 , 𝑅2 , … , 𝑅𝑗 that minimize
the RSS:

• Where 𝑦ො𝑅𝑗 is the mean target for the training observations within the 𝑗𝑡ℎ rectangle.

Prof. Pedram Jahangiry

Tree building process: Recursive Binary Splitting

• How does the algorithm select 𝑋𝑗 and the split 𝑠 ?

• 𝑋𝑗 and 𝑠 are selected such that splitting the feature space into the
regions {𝑋|𝑋𝑗 < 𝑠} and {𝑋|𝑋𝑗 ≥ 𝑠} leads to the largest possible
reduction in RSS.

𝑅1 𝑗, 𝑠 = 𝑋 𝑋𝑗 < 𝑠 and 𝑅2 𝑗, 𝑠 = {𝑋|𝑋𝑗 ≥ 𝑠}

• Seeking for the value of 𝑗 and 𝑠 that minimized the following

equation:

• The best split is made at that particular step, rather than looking ahead and picking a split that will lead
to a better tree in some future step.

Prof. Pedram Jahangiry

Tree building process: Recursive Binary Splitting
• Next, the algorithm repeats the process, looking for the best feature and best split in order to
split the data further to minimize the RSS within each of the resulting regions.
• The process continues until a stopping criterion is reached; for instance, continues until no
region contains more than a fixed number of observations.

Prof. Pedram Jahangiry

A Five-Region Example of Recursive Binary Splitting

The output of recursive binary A tree corresponding to the A perspective plot of the prediction
splitting on a two-dimensional partition in the left panel. surface corresponding to that tree.
example

Prof. Pedram Jahangiry

Overfitting?

Prof. Pedram Jahangiry

Classification Trees
• Classification trees are very similar to regression trees, except that it is used to predict
a qualitative response rather thana quantitative one.
• The prediction of the algorithm at each terminal node will be the category with the
majority of data points i.e., the most commonly occurring class.

Prof. Pedram Jahangiry

Classification Trees (details)
• Just as in the regression setting, the recursive binary splitting is used to grow a
classification tree. However, instead of RSS we will be using one of the following
impurity criteria:

1. Classification error rate:

2. Gini index:

3. Cross entropy:

• 𝑝Ƹ 𝑚𝑘 represents the proportion of training observations in the 𝑚𝑡ℎ region from the 𝑘 𝑡ℎ class.
• Classification error rate is not sufficiently sensitive to node purity and in practice either Gini or
Cross entropy is preferred.

Prof. Pedram Jahangiry

Decision Tree Metrics (Simple Example)
Node 𝐺𝑖𝑛𝑖 𝐶𝑟𝑜𝑠𝑠 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝐸𝑟𝑟𝑜𝑟 𝑟𝑎𝑡𝑒
1 − max 𝑝𝑖
1− ෍ 𝑝𝑗2 − ෍ 𝑝𝑗 log 𝑝𝑖
𝑗 𝑗
Entire training 2 2 10 10 20 20 10 20
1–
10
+
20
= 0.44
− 𝑙𝑜𝑔 + 𝑙𝑜𝑔 = 1 − max ,
30 30
data before 30 30 30 30
30 30 20
split 0.64 =1− = 0.333
30
Root node: 20 9 2 11 2 20 9 9 11 11 20 9 11
∗ 1– + + − log 2 + log 2 + 30
{1 − max ,
20 20
}+
X1 > s 30 20 20 30 20 20 20 20
10 1 1 9 9 10 1 9
10 1 2 9 2 − log 2 + log 2 = 30
{1 − max ,
10 10
}=
∗ 1– + = 30 10 10 10 10
30 10 10 20 10 0.333
∗ 0.69 + ∗ 0.325 = 𝟎. 𝟓𝟕
20 10 30 30
∗ 0.495 + ∗ 0.18 = 𝟎. 𝟑𝟗
30 30
Root node: 2 2 15 2 2 13 13 15 2 13
15
∗ 1–
2
+
13
+
− log 2 + log 2 + 30
{1 − max ,
15 15
}+
X2 > s 30 15 15 15 15
30 15 15 15 8 8 7 7 15 8 7
− log 2 + log 2 = 30
{1 − max ,
15 15
}=
2 2 30 15 15 15 15
15 8 7 15 15
30
∗ 1–
15
+
15
=
∗ 0.39 + ∗ 0.69 = 𝟎. 𝟓𝟒 0.3
30 30
15 15
∗ 0.231 + ∗ 0.497 = 𝟎. 𝟑𝟕
30 30

Prof. Pedram Jahangiry

Trees Versus Linear Models
Left column: linear model; Right column: tree-based model

Top Row: True linear boundary

Bottom row: true non-linear
boundary.

Prof. Pedram Jahangiry

Part III
Pruning a tree
Tunning hyper parameters

Prof. Pedram Jahangiry

Pruning a tree
• A smaller tree with fewer splits may lead to lower variance
and better interpretation (but) at the cost of higher bias.
• Smaller trees are too short-sighted:

“ a seemingly worthless split early on in the tree might be

followed by a very good split, a split that leads to a large
reduction in RSS/impurity index later on”

• A better strategy is to grow a very large tree, this may

produce good predictions on the training set, but is likely to
overfit the data, leading to poor test set performance.
• So, we need to prune it back in order to obtain a subtree.
• Cost complexity pruning is used to do this.

Prof. Pedram Jahangiry

Cost complexity pruning (weakest link pruning)

• Consider a sequence of trees indexed by a nonnegative tuning parameter 𝛼.

• For each value of 𝛼 there corresponds a subtree 𝑇 ⊂ 𝑇0 such that the following objective
function is minimized.

• |𝑇| indicates the number of terminal nodes of the tree 𝑇

• 𝑅𝑚 is the rectangle corresponding to 𝑚𝑡ℎ terminal node and
• 𝑦ො𝑅𝑚 is the mean of the training observations in 𝑅𝑚

• 𝛼 controls the bias variance trade off and is determined by cross validation.
• Lastly, we return to full data set and obtain the subtree corresponding to 𝛼

Prof. Pedram Jahangiry

Salary example continued

The unpruned tree

that results from
recursive binary
splitting on the
training data

Prof. Pedram Jahangiry

Finding the optimal 𝛼 or T

Prof. Pedram Jahangiry

The optimal (pruned) tree

Prof. Pedram Jahangiry

Other hyperparameters
✓ To avoid overfitting, regularization parameters can be added to the model such as:
• Maximum depth of the tree
• Minimum population at a node
• Maximum number of decision nodes
• Minimum impurity decrease (info gain)
• Alpha (complexity parameter)
✓ Other hyperparameters are:
• Criterion: gini, entropy
• Splitter: best, random
• Class weight: balanced, none

Prof. Pedram Jahangiry

Part IV
Pros and Cons
Applications in finance

Prof. Pedram Jahangiry

DTs’ Pros and Cons

Pros:
• Easy to interpret and visualize
• Can easily handle categorical data without the need to create dummy variables
• Can easily capture Non-linear patterns
• Can handle data in its raw form (no preprocessing needed). Why?
• Has no assumptions about distribution because of the non-parametric nature of the algorithm

Cons:
• Poor level of predictive accuracy.
• Sensitive to noisy data. It can overfit noisy data. Small variations in data can result in the
different decision tree*.
*This can be reduced by bagging and boosting algorithms.

Prof. Pedram Jahangiry

DTs’ Applications in finance

• Enhancing detection of fraud in financial statements,

• Generating consistent decision processes in equity and fixed-income selection
• Simplifying communication of investment strategies to clients.
• Portfolio allocation problems.

Prof. Pedram Jahangiry

Appendix A

Prof. Pedram Jahangiry

Class Modules
✓ Module 1- Introduction to Machine Learning
✓ Module 2- Setting up Machine Learning Environment
✓ Module 3- Linear Regression (Econometrics approach)
✓ Module 4- Machine Learning Fundamentals
✓ Module 5- Linear Regression (Machine Learning approach)
✓ Module 6- Penalized Regression (Ridge, LASSO, Elastic Net)
✓ Module 7- Logistic Regression
✓ Module 8- K-Nearest Neighbors (KNN)
✓ Module 9- Classification and Regression Trees (CART)
• Module 10- Bagging and Boosting
• Module 11- Dimensionality Reduction (PCA)
• Module 12- Clustering (KMeans – Hierarchical)

Prof. Pedram Jahangiry

Download all chapters of Statistics A Gentle Introduction 3rd Edition Coolidge Test Bank as a single PDF instantly.
100% (5)
Download all chapters of Statistics A Gentle Introduction 3rd Edition Coolidge Test Bank as a single PDF instantly.
45 pages
Chap 05 Power Point Slides
No ratings yet
Chap 05 Power Point Slides
103 pages
Decision Tree Notes (1)
No ratings yet
Decision Tree Notes (1)
6 pages
RSI + Stochastic + MACD + Heikin Ashi Candle by Danhy989 (2)
No ratings yet
RSI + Stochastic + MACD + Heikin Ashi Candle by Danhy989 (2)
2 pages
Module 10- Part 1- Bagging and RandomForest
No ratings yet
Module 10- Part 1- Bagging and RandomForest
22 pages
MA4270
No ratings yet
MA4270
1 page
Lecture2 Decision Tree and Random Forest
No ratings yet
Lecture2 Decision Tree and Random Forest
24 pages
Lab 12
No ratings yet
Lab 12
9 pages
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
17 pages
Random Forest Explained
No ratings yet
Random Forest Explained
39 pages
第4章参数高效微调
No ratings yet
第4章参数高效微调
33 pages
BD DBSS Parameters OVF30 AAA30288AAG - 2005-10-17
No ratings yet
BD DBSS Parameters OVF30 AAA30288AAG - 2005-10-17
23 pages
ML Unit 3
No ratings yet
ML Unit 3
28 pages
Chapter 09 CART-3
No ratings yet
Chapter 09 CART-3
42 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
Srinivasan Et Al 2019 - Food & Metabolism of Alcohol
No ratings yet
Srinivasan Et Al 2019 - Food & Metabolism of Alcohol
9 pages
9th SCIENCE SA1 2022-23
No ratings yet
9th SCIENCE SA1 2022-23
2 pages
Ch8 Tree Based Methods
No ratings yet
Ch8 Tree Based Methods
81 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
Princeton CEFRC3 5
No ratings yet
Princeton CEFRC3 5
44 pages
1-LKJ2000 train running monitor device
No ratings yet
1-LKJ2000 train running monitor device
68 pages
2023AIB1008_Lab08
No ratings yet
2023AIB1008_Lab08
8 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
6 - CART Models
No ratings yet
6 - CART Models
15 pages
العمليات الأساسية لوحدة التحكم الإلكترونية في المحرك
No ratings yet
العمليات الأساسية لوحدة التحكم الإلكترونية في المحرك
23 pages
10 - CART
No ratings yet
10 - CART
39 pages
unit 3
No ratings yet
unit 3
28 pages
Session3 Eng
No ratings yet
Session3 Eng
44 pages
4 Classification 1
100% (1)
4 Classification 1
45 pages
Panti Ramos, Darío. Trabajo de Estadistica Descriptiva e Inferencial
No ratings yet
Panti Ramos, Darío. Trabajo de Estadistica Descriptiva e Inferencial
13 pages
CH 12
No ratings yet
CH 12
19 pages
Module10 TreeBasedMethods
No ratings yet
Module10 TreeBasedMethods
33 pages
WI-10 Final Unit Test - Syringe Pump
No ratings yet
WI-10 Final Unit Test - Syringe Pump
16 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
MI_Unit 4
No ratings yet
MI_Unit 4
79 pages
Dadm s16 Cart
No ratings yet
Dadm s16 Cart
18 pages
Decision_tree
No ratings yet
Decision_tree
15 pages
T3-SCARA ROBOT DESIGN
No ratings yet
T3-SCARA ROBOT DESIGN
56 pages
Module09 TreeBasedMethods
No ratings yet
Module09 TreeBasedMethods
36 pages
Syl S17
No ratings yet
Syl S17
2 pages
2006 Ks3 Sat Paper 1
No ratings yet
2006 Ks3 Sat Paper 1
24 pages
Chapter 1.1 Principles of Marketing
No ratings yet
Chapter 1.1 Principles of Marketing
10 pages
Graphic Design Business Plan Example
No ratings yet
Graphic Design Business Plan Example
35 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
STAT 432: Basics of Statistical Learning: Tree and Random Forests
No ratings yet
STAT 432: Basics of Statistical Learning: Tree and Random Forests
54 pages
Chapter 09 CART - N
No ratings yet
Chapter 09 CART - N
24 pages
CH 04
No ratings yet
CH 04
12 pages
Unit IV
No ratings yet
Unit IV
36 pages
Classification and Regression Tree Construction
No ratings yet
Classification and Regression Tree Construction
18 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
A Tolerated Margin of Mess
100% (1)
A Tolerated Margin of Mess
41 pages
PSR 0607 Chap10
No ratings yet
PSR 0607 Chap10
33 pages
FMLanswerkey-IT 2.docx (1) (1) (1)
No ratings yet
FMLanswerkey-IT 2.docx (1) (1) (1)
11 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
TEAA_ Tree Ensembles-1
No ratings yet
TEAA_ Tree Ensembles-1
43 pages
SQL Server Training Course Prerequisite
No ratings yet
SQL Server Training Course Prerequisite
8 pages
08 Tree Regression 1
No ratings yet
08 Tree Regression 1
49 pages
O o o o o o o o o o o
No ratings yet
O o o o o o o o o o o
17 pages
Sorting Algorithms and Techniques: Definitive Reference for Developers and Engineers
From Everand
Sorting Algorithms and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
37 pages
Chapter 7 - Trees
No ratings yet
Chapter 7 - Trees
80 pages
Trees Handout
No ratings yet
Trees Handout
51 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
WD-1256RD Service Manual
No ratings yet
WD-1256RD Service Manual
44 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
ML Unit 3
No ratings yet
ML Unit 3
49 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
36 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Physics First Year
No ratings yet
Physics First Year
28 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Chapter 9 - Classification and Regression Trees: Data Mining For Business Intelligence
No ratings yet
Chapter 9 - Classification and Regression Trees: Data Mining For Business Intelligence
36 pages
GROUP2-Ak Bank Part A PDF
No ratings yet
GROUP2-Ak Bank Part A PDF
19 pages
Random Forest
No ratings yet
Random Forest
83 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Fraud Detection in Auto Insurance
No ratings yet
Fraud Detection in Auto Insurance
28 pages
Step by Step BSSV Development
100% (1)
Step by Step BSSV Development
28 pages
Chapter 5: Personnel Planning and Recruiting
No ratings yet
Chapter 5: Personnel Planning and Recruiting
20 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Importance of Course Module in Academic Performance of Students M.Ranga Reddy (PHD) 49
No ratings yet
Importance of Course Module in Academic Performance of Students M.Ranga Reddy (PHD) 49
12 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
ADAPT-Floor Pro RC Design Example
100% (2)
ADAPT-Floor Pro RC Design Example
18 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Advanced Quantum Mechanics, Notes Based On Online Course Given by Leonard Susskind - Lecture 1
No ratings yet
Advanced Quantum Mechanics, Notes Based On Online Course Given by Leonard Susskind - Lecture 1
7 pages
Valves
No ratings yet
Valves
34 pages
Cfa I
No ratings yet
Cfa I
45 pages
Mechanical Vibration by Janusz Krodkiewski
No ratings yet
Mechanical Vibration by Janusz Krodkiewski
247 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
VDO 2011 Instrumentation Catalog Reprint
No ratings yet
VDO 2011 Instrumentation Catalog Reprint
68 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet