0% found this document useful (0 votes)

35 views

Additive Models and Trees

The document describes classification and regression trees (CART), which use binary recursive partitioning to divide a feature space into regions and fit a tree model. Key points: - Trees can represent a feature space with a single model and make no distributional assumptions. - Growing a tree involves recursively splitting nodes based on variables and split points to minimize impurity. Trees are then pruned to avoid overfitting. - Classification trees assign observations to the majority class of a node. Regression trees predict a continuous variable. - Impurity measures like Gini index and cross-entropy are used to determine optimal splits that reduce misclassification in nodes.

Uploaded by

Ao Lv

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

Additive Models and Trees

Uploaded by

Ao Lv

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Introduction

Classification Trees
Regression Trees
Conclusions
References

Chapter 9: Additive Models and Trees

Section 9.2 Tree Based Models

Clint P. George

Department of Computer and Information Science and Engineering

University of Florida

Elements of Statistical Learning

Mar 15, 2010

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Classification Trees
Regression Trees
Conclusions
References

1 Introduction
Tree Based Models
Example
2 Classification Trees
General Setup
Growing the Tree
Tree Pruning
Classification Tree: Example
3 Regression Trees
Overview
Growing the Tree
Tree Pruning
Regression Tree: Example
4 Conclusions

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Classification Trees
Tree Based Models
Regression Trees
Example
Conclusions
References

Overview

It is a method based on binary recursive partitioning of the feature

space into regions and fitting a tree model

Some characteristics
I No distribution assumptions on the variables

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Classification Trees
Tree Based Models
Regression Trees
Example
Conclusions
References

Overview

It is a method based on binary recursive partitioning of the feature

space into regions and fitting a tree model

Some characteristics
I No distribution assumptions on the variables
I The feature space can be fully represented by a single tree

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Classification Trees
Tree Based Models
Regression Trees
Example
Conclusions
References

Overview

It is a method based on binary recursive partitioning of the feature

space into regions and fitting a tree model

Some characteristics
I No distribution assumptions on the variables
I The feature space can be fully represented by a single tree
I Interpretable

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Classification Trees
Tree Based Models
Regression Trees
Example
Conclusions
References

Overview

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Classification Trees
Tree Based Models
Regression Trees
Example
Conclusions
References

Example: Classification Tree

Data set: IRIS - four features, 150 samples, and three classes
Software: RPART (Terry M. Therneau, 1997) package in R

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Classification Trees
Tree Based Models
Regression Trees
Example
Conclusions
References

Observations
Node Split Nm loss prediction proportions
1 root 150 100 setosa (0.33 0.33 0.33)
2 PL < 2.45 50 0 setosa (1.00 0.00 0.00)*
3 PL ≥ 2.45 100 50 versicolor (0.00 0.50 0.50)
6 PW < 1.75 54 5 versicolor (0.00 0.90 0.09)
- - - - - -
Table: Tree split path and node proportions

The variables actually used in tree construction: Petal.Length,

Petal.Width and Sepal.Length
The tree partitions the sample space into nine regions

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Overview

Basic idea of any tree building :

Grow a large and complicated tree that explains the data
I decide the splitting variables (predictors) and split points
I binary recursive partitioning
I stopping criterion e.g. max number of samples in a leaf node
Prune the tree to avoid over fitting
References: Bishop (2007); Pham (2006); T. Hastie (2009)

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Classification Tree Setup

Suppose, we have a K class classification problem with data {xi , yi }N
i=1
where
yi {1, 2, ...K }
xi = (xi1 , xi2 , ...xid ), d = dimensionality

then the node proportion of a class k at node m:

m N
1 X
ˆ =
pmk I(yi = k), k = 1, 2, ...K , (1)
Nm
i=1

where Nm = number of observations in a tree node m and I(A) is an

indicator function

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Classification Rule
Classify the observations in node m to the majority class

ˆ ]
k (m) = argmaxk [pmk

A more general rule is to assign node m to the class

k(m) = argmink [rmk ]

where rmk is the expected misclassification cost for class k

Suppose πmj is the probability of node m as class j and c(k|j) = the
cost of classifying a class j sample as class k sample, then
X
rmk = c(k |j)πmj
j

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Impurity Functions

Our aim is to reduce the node misclassification cost i.e. make all
the samples in a node belongs to one class

This can be seen as reducing the node impurity

Popular functions to measure degree of impurity:

Misclassification error = N1m i Rm I(yi 6= k(m)) = 1 − max{pmk }
P
I
PK
I Gini index = k=1 pmk (1 − pmk )
PK
I Cross-entropy (deviance) = − k=1 pmk log(pmk )

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Comparison of Impurity Functions (2 class problem)

1 Encourages the formation of regions in which high proportion of

data points assigns to one class
2 Gini index and cross entropy are differentiable and more sensitive
to change in node proportions (pmk )
Clint P. George Chapter 9: Additive Models and Trees
Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Tree Splitting Algorithm

Assume a predictor xj
Let mleft and mright be the left and right branches by splitting node
m based on xj
I when xj is continuous or ordinal, mleft and mright are given by xj < s
and xj ≥ s for a splitting point s
I when xj is categorical we may need exhaustive subset search to
find s
And qleft and qright be the proportion of samples in node m
assigned into mleft and mright

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Tree Splitting Algorithm

For each xj , find the split by maximizing the decrease in

∆ij (s, m) = i(m) − [qleft i(mleft ) + qright i(mright )]

where
K K
2
X X
i(m) = ˆ (1 − pmk
pmk ˆ )=1 − ˆ ]
[pmk
k=1 k=1

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Greedy Algorithm

Scan through all predictors (xj ) to find the best pair (j, s) with
largest decrease in ∆ij (s, m)
Then repeat this splitting procedure recursively for mleft and mright
Define a stopping criteria:
I Stop when some minimum node size (Nm ) is reached
I Split only when the decrease in cost reaches a threshold
Tree size:
I A very large tree may over fit the data
I A small tree may not structure the important data structure

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Cost Complexity Pruning

Focus is to balance misclassification error against a measure of
complexity
Suppose, we got a large tree T0 by using the greedy algorithm, m
indexes terminal nodes
Define a sub tree T ⊂ T0 by pruning nodes from T0
I Collapsing the internal nodes by combining the corresponding
regions
Find Tα ⊂ T0 for a given α ≥ 0 that minimizes the cost-complexity
criterion
|T |
X
Cα (T ) = R(T ) + α|T | = Nm i(m) + α|T |
m=1

I |T | is the number of terminal nodes in a tree T (model complexity)

I i(m) is the node impurity
Clint P. George Chapter 9: Additive Models and Trees
Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Tuning Parameter (α)

α can be interpreted as the complexity cost per terminal node.

α determines the trade-off between the overall misclassification
error and the model complexity
I when α is small, the penalty for having a larger tree is small so Tα
is large.
I when α increases, |Tα | decreases

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Weakest link pruning

Define Tm as a branch of Ti containing a node m and its
descendants
When Ti is pruned at node m
I its misclassification cost increases by R(m) − R(Tm ), where
|T |
X
R(T ) = Nm i(m)
m=1

I and its complexity decreases by |Tm | − 1

I the ratio
R(m) − R(Tm )
gi (m) =
|Tm | − 1
measures the increase in misclassification cost per pruned terminal
node
Clint P. George Chapter 9: Additive Models and Trees
Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Weakest link pruning

Ti+1 is obtained by pruning all nodes in Ti with lowest value of

gi (m) i.e. the weakest link
αi associated with Ti is given by αi = minm gi (m)
starting with T0 continue this pruning procedure till it reaches TI
(tree with only the root node)
then CART identifies the optimal subtree from {Ti |i = 0, 1, ...I} by
selecting
I the one with minimal classification error (0-SE rule)
I or the smallest tree with in one standard error of minimum error rate
(1-SE rule)
I one approach is to use cross validation to find out the error

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Weakest link pruning: Example

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Spam data: Misclassification

4601 samples and 57 features (two class problem)
train set (80 percent of the data set), and the rest for test

Clint P. George Chapter 9: Additive Models and Trees

Introduction
General Setup
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Classification Tree: Example
References

Spam data: Relative Error

Cross validation (10 fold) is done to find best alpha

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Overview
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Regression Tree: Example
References

Regression Trees: Overview

Key differences:
The outcome variables are continuous
The criteria for splitting and pruning : squared error
The calculation of predicted value: it is done by averaging the
variables in a tree node

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Overview
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Regression Tree: Example
References

Impurity function: Squared Error

Suppose the feature space is partitioned into M regions {R1 , R2 , ...RM }
and the response in each region Rm is represented as a constant cm ,
then the regression model can be represented as :
M
X
f (x) = cm I(x Rm )
m=1

If we use minimization criteria as Squared Error

N
X
(yi − f (xi ))2
i=1

then best cˆm will be cˆm = ave(yi |xi Rm )

Clint P. George Chapter 9: Additive Models and Trees
Introduction
Overview
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Regression Tree: Example
References

Greedy Algorithm
Start with all of the data (xi , yi )N
i=1 , consider a splitting variable j
and a split point s, define the regions

R1 (j, s) = {X |Xj ≤ s} and R2 (j, s) = {X |Xj > s}

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Overview
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Regression Tree: Example
References

Greedy Algorithm
Start with all of the data (xi , yi )N
i=1 , consider a splitting variable j
and a split point s, define the regions

R1 (j, s) = {X |Xj ≤ s} and R2 (j, s) = {X |Xj > s}

Then the variables j and s can be solved using the greedy criterion
X X
minj,s [minc1 (yi − c1 )2 + minc2 (yi − c2 )2 ]
xi R1 (j,s) xi R2 (j,s)

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Overview
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Regression Tree: Example
References

Greedy Algorithm
Start with all of the data (xi , yi )N
i=1 , consider a splitting variable j
and a split point s, define the regions

R1 (j, s) = {X |Xj ≤ s} and R2 (j, s) = {X |Xj > s}

Then the variables j and s can be solved using the greedy criterion
X X
minj,s [minc1 (yi − c1 )2 + minc2 (yi − c2 )2 ]
xi R1 (j,s) xi R2 (j,s)

For any selected j and s the inner minimization is solved by

cˆ1 = ave(yi |xi R1 (j, s)) and cˆ2 = ave(yi |xi R2 (j, s))

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Overview
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Regression Tree: Example
References

Tree Pruning

The tree pruning can be done by weakest link pruning using the
squared error impurity function

1 X
i(m) = (yi − cˆm )2
Nm
xi Rm

The cost complexity criterion is similar to the case of classification

|T |
X
Cα (T ) = Nm i(m) + α|T |
m=1

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Overview
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Regression Tree: Example
References

Data set: cars

Data set: 60 data points and 8 features ( training set = 2/3 of the data;
the rest is for testing)

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Overview
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Regression Tree: Example
References

Tree structure’s sensitivity to the training set

The left tree is from training set 1 and the right tree is from set 2.

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Overview
Classification Trees
Growing the Tree
Regression Trees
Tree Pruning
Conclusions
Regression Tree: Example
References

Tree structure’s sensitivity to the training set

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Classification Trees
Regression Trees
Conclusions
References

Summary
Advantages:
CART makes no distribution assumptions on the variables and
supports both categorical and continuous variables
Binary tree structure offers excellent interpret-ability
Can be used for ranking the variables (by summing up the impurity
measures across all nodes in the tree)
Disadvantages:
Since CART uses binary tree, it suffers from instability
Splits are aligned with the axes of the feature space, which may
be suboptimal

Clint P. George Chapter 9: Additive Models and Trees

Introduction
Classification Trees
Regression Trees
Conclusions
References

Bishop, C. M. (2007). Pattern Recognition and Machine Learning.

Springer, 1 edition.
Pham, H. (2006). Springer Handbook of Engineering Statistics.
Springer, 1 edition.
T. Hastie, R. Tibshirani, J. F. (2009). The Elements of Statistical
Learning. Springer, 2 edition.
Terry M. Therneau, Elizabeth J. Atkinson, M. F. (1997). An introduction
to recursive partitioning using the rpart routines. Technical report.

Clint P. George Chapter 9: Additive Models and Trees

Unit 4-2
No ratings yet
Unit 4-2
20 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
Random Forest
No ratings yet
Random Forest
83 pages
Chapter 7 - Trees
No ratings yet
Chapter 7 - Trees
80 pages
Machine Learning in Ecology
No ratings yet
Machine Learning in Ecology
15 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
36 pages
Chapter 9 - Classification and Regression Trees: Data Mining For Business Intelligence
No ratings yet
Chapter 9 - Classification and Regression Trees: Data Mining For Business Intelligence
36 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
decision tree
No ratings yet
decision tree
13 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
23 Ens RandomForests
No ratings yet
23 Ens RandomForests
27 pages
I2ml3e Chap9
No ratings yet
I2ml3e Chap9
15 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Module09 TreeBasedMethods
No ratings yet
Module09 TreeBasedMethods
36 pages
T1L2 Classification Trees
No ratings yet
T1L2 Classification Trees
58 pages
Classification and Regression Trees (CART) Theory and Applications
No ratings yet
Classification and Regression Trees (CART) Theory and Applications
40 pages
CS467-M4-Machine Learning-Ktustudents - in
No ratings yet
CS467-M4-Machine Learning-Ktustudents - in
9 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
PSR 0607 Chap10
No ratings yet
PSR 0607 Chap10
33 pages
AST Day 3 Slides
No ratings yet
AST Day 3 Slides
79 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
No ratings yet
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
18 pages
10 - CART
No ratings yet
10 - CART
39 pages
Random Forest
No ratings yet
Random Forest
8 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Unit 4
No ratings yet
Unit 4
33 pages
Slides (A19 A20)
No ratings yet
Slides (A19 A20)
261 pages
An Empirical Comparison of Pruning Methods For Decision Tree Induction
No ratings yet
An Empirical Comparison of Pruning Methods For Decision Tree Induction
17 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
25 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Module10 TreeBasedMethods
No ratings yet
Module10 TreeBasedMethods
33 pages
AIML Ak
No ratings yet
AIML Ak
21 pages
BSC ML Ch3.pptx
No ratings yet
BSC ML Ch3.pptx
106 pages
Dm-Classtrees-2-2018 PDF
No ratings yet
Dm-Classtrees-2-2018 PDF
46 pages
MI_Unit 4
No ratings yet
MI_Unit 4
79 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
37 pages
NOTES
No ratings yet
NOTES
18 pages
Random Forest Explained
No ratings yet
Random Forest Explained
39 pages
Montillo RandomForests 4-2-2009
No ratings yet
Montillo RandomForests 4-2-2009
28 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree (Autosaved)
No ratings yet
Decision Tree (Autosaved)
14 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
phys361-S24-lecture-17-random-forests
No ratings yet
phys361-S24-lecture-17-random-forests
24 pages
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
No ratings yet
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
20 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
48 pages
6 - CART Models
No ratings yet
6 - CART Models
15 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Cart Introduction Beamer
No ratings yet
Cart Introduction Beamer
18 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
15 pages
Trees Handout
No ratings yet
Trees Handout
51 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Julia for Data Science
From Everand
Julia for Data Science
Anshul Joshi
No ratings yet
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
AIML Interview Questions
No ratings yet
AIML Interview Questions
17 pages
Lecture Notes - Decision Tree
No ratings yet
Lecture Notes - Decision Tree
13 pages
Classification and Regression Trees First Issued in Hardback Edition Breiman 2024 Scribd Download
100% (5)
Classification and Regression Trees First Issued in Hardback Edition Breiman 2024 Scribd Download
59 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Machine Learning: B.E, M.Tech, PH.D
No ratings yet
Machine Learning: B.E, M.Tech, PH.D
23 pages
Instant Access To Classification and Regression Trees Leo Breiman Ebook Full Chapters
100% (4)
Instant Access To Classification and Regression Trees Leo Breiman Ebook Full Chapters
62 pages
2.decision Tree
No ratings yet
2.decision Tree
74 pages
module 2
No ratings yet
module 2
42 pages
Top 50 Data Mining Interview Questions & Answers PDF
No ratings yet
Top 50 Data Mining Interview Questions & Answers PDF
30 pages
Ai & ML 2 Marks Was
No ratings yet
Ai & ML 2 Marks Was
23 pages
Decision Trees Explained - Entropy, Information Gain, Gini Index, CCP Pruning - by Shailey Dash - Towards Data Science
No ratings yet
Decision Trees Explained - Entropy, Information Gain, Gini Index, CCP Pruning - by Shailey Dash - Towards Data Science
25 pages
Lithofacies, and Hydraulic Flow Units
No ratings yet
Lithofacies, and Hydraulic Flow Units
11 pages
DATA ANAYTICS Notes UNIT4
No ratings yet
DATA ANAYTICS Notes UNIT4
45 pages
Decision Trees: A Recent Overview: S. B. Kotsiantis
No ratings yet
Decision Trees: A Recent Overview: S. B. Kotsiantis
23 pages
Issues in Decision Tree Learning
No ratings yet
Issues in Decision Tree Learning
6 pages
51 Machine Learning Interview Questions With Answers - Springboard
100% (1)
51 Machine Learning Interview Questions With Answers - Springboard
20 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
CSE445 T5a Decision Trees
No ratings yet
CSE445 T5a Decision Trees
54 pages
Decision Tree
No ratings yet
Decision Tree
51 pages
整合機器學習方法於決策樹為基智慧型排程系統之研究
No ratings yet
整合機器學習方法於決策樹為基智慧型排程系統之研究
76 pages
Sat - 149.Pdf - Prediction of Bigmart Sales Using Machine Learning Algorihms
No ratings yet
Sat - 149.Pdf - Prediction of Bigmart Sales Using Machine Learning Algorihms
11 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
Data Analytics - Unit 4 (22IT513PE)
100% (1)
Data Analytics - Unit 4 (22IT513PE)
30 pages
Decision Trees in Machine Learning - by Prashant Gupta - Towards Data Science
No ratings yet
Decision Trees in Machine Learning - by Prashant Gupta - Towards Data Science
6 pages
NguyenCongSang ITITIU20292 Lab3
No ratings yet
NguyenCongSang ITITIU20292 Lab3
21 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
15 MCQs ML (DT Classification)
No ratings yet
15 MCQs ML (DT Classification)
6 pages
Cs3491 Aiml Q&A Material
No ratings yet
Cs3491 Aiml Q&A Material
22 pages

Additive Models and Trees

Uploaded by

Additive Models and Trees

Uploaded by

Introduction

Chapter 9: Additive Models and Trees

Department of Computer and Information Science and Engineering

Elements of Statistical Learning

Mar 15, 2010

Clint P. George Chapter 9: Additive Models and Trees

Clint P. George Chapter 9: Additive Models and Trees

It is a method based on binary recursive partitioning of the feature

Clint P. George Chapter 9: Additive Models and Trees

It is a method based on binary recursive partitioning of the feature

Clint P. George Chapter 9: Additive Models and Trees

It is a method based on binary recursive partitioning of the feature

Clint P. George Chapter 9: Additive Models and Trees

Clint P. George Chapter 9: Additive Models and Trees

Example: Classification Tree

Clint P. George Chapter 9: Additive Models and Trees

The variables actually used in tree construction: Petal.Length,

Clint P. George Chapter 9: Additive Models and Trees

Basic idea of any tree building :

Clint P. George Chapter 9: Additive Models and Trees

Classification Tree Setup

then the node proportion of a class k at node m:

where Nm = number of observations in a tree node m and I(A) is an

Clint P. George Chapter 9: Additive Models and Trees

A more general rule is to assign node m to the class

k(m) = argmink [rmk ]

where rmk is the expected misclassification cost for class k

Clint P. George Chapter 9: Additive Models and Trees

This can be seen as reducing the node impurity

Popular functions to measure degree of impurity:

Clint P. George Chapter 9: Additive Models and Trees

Comparison of Impurity Functions (2 class problem)

1 Encourages the formation of regions in which high proportion of

Tree Splitting Algorithm

Clint P. George Chapter 9: Additive Models and Trees

Tree Splitting Algorithm

For each xj , find the split by maximizing the decrease in

∆ij (s, m) = i(m) − [qleft i(mleft ) + qright i(mright )]

Clint P. George Chapter 9: Additive Models and Trees

Clint P. George Chapter 9: Additive Models and Trees

Cost Complexity Pruning

I |T | is the number of terminal nodes in a tree T (model complexity)

Tuning Parameter (α)

α can be interpreted as the complexity cost per terminal node.

Clint P. George Chapter 9: Additive Models and Trees

Weakest link pruning

I and its complexity decreases by |Tm | − 1

Weakest link pruning

Ti+1 is obtained by pruning all nodes in Ti with lowest value of

Clint P. George Chapter 9: Additive Models and Trees

Weakest link pruning: Example

Clint P. George Chapter 9: Additive Models and Trees

Spam data: Misclassification

Clint P. George Chapter 9: Additive Models and Trees

Spam data: Relative Error

Clint P. George Chapter 9: Additive Models and Trees

Regression Trees: Overview

Clint P. George Chapter 9: Additive Models and Trees

Impurity function: Squared Error

If we use minimization criteria as Squared Error

then best cˆm will be cˆm = ave(yi |xi  Rm )

R1 (j, s) = {X |Xj ≤ s} and R2 (j, s) = {X |Xj > s}

Clint P. George Chapter 9: Additive Models and Trees

R1 (j, s) = {X |Xj ≤ s} and R2 (j, s) = {X |Xj > s}

Clint P. George Chapter 9: Additive Models and Trees

R1 (j, s) = {X |Xj ≤ s} and R2 (j, s) = {X |Xj > s}

For any selected j and s the inner minimization is solved by

Clint P. George Chapter 9: Additive Models and Trees

The cost complexity criterion is similar to the case of classification

Clint P. George Chapter 9: Additive Models and Trees

Data set: cars

Clint P. George Chapter 9: Additive Models and Trees

Tree structure’s sensitivity to the training set

Clint P. George Chapter 9: Additive Models and Trees

Tree structure’s sensitivity to the training set

Clint P. George Chapter 9: Additive Models and Trees

then best cˆm will be cˆm = ave(yi |xi Rm )