0% found this document useful (0 votes)
77 views19 pages

Dinesh Kumar Indra Panwar Arjan Singh

This document provides an overview of decision trees and compares two popular algorithms: CART and C4.5. It discusses the structure of decision trees and considerations for growing trees, such as selecting features and determining when to stop splitting to avoid overfitting. CART uses the Gini index as its splitting criteria and performs binary recursive partitioning, while C4.5 uses the gain ratio and allows for multiway splits. Both algorithms can handle nominal and numeric data. CART performs cost-complexity pruning to reduce overfitting, while C4.5 uses pessimistic pruning or subtree raising. The document also provides examples to illustrate how CART and C4.5 make splitting decisions at each node.

Uploaded by

Arjan Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views19 pages

Dinesh Kumar Indra Panwar Arjan Singh

This document provides an overview of decision trees and compares two popular algorithms: CART and C4.5. It discusses the structure of decision trees and considerations for growing trees, such as selecting features and determining when to stop splitting to avoid overfitting. CART uses the Gini index as its splitting criteria and performs binary recursive partitioning, while C4.5 uses the gain ratio and allows for multiway splits. Both algorithms can handle nominal and numeric data. CART performs cost-complexity pruning to reduce overfitting, while C4.5 uses pessimistic pruning or subtree raising. The document also provides examples to illustrate how CART and C4.5 make splitting decisions at each node.

Uploaded by

Arjan Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Decision Tree:

Classification And
Regression Tree (CART) &
C4.5
Dinesh Kumar
Indra Panwar
Arjan Singh
Decision tree Label

y y

x x
Decision tree

Input Outcome/Output
variables/Predictors variables
Decision tree
 Is a tree like training model generated by learning decision rules inferred from training
data
 It allows to add multiple linear equations one after another
 The key idea:

To select splits that decrease the impurity of class distribution in the resulting subtree.

Structure
 Consists of
 Node (attributes) – represents features
 Link ( branch) - represents decision
 Leaf (terminal node) – represents outcome
Decision tree
Considerations when growing tree
 Features to choose and condition for splitting
 Values of some attributes gives more information than others(Information attribute_1 < Information attribute_2)
 Quantification of information provided by attribute
Information content = − log 𝑃 𝑋 = 𝑥
Lesser the probability more information it provides
 Amount of uncertainty/unpredictability in information
Entropy = 𝛴𝑃 𝑋 = 𝑥 − log 𝑃 𝑋 = 𝑥
All data of same class = 0 , Data evenly distributed among classes = 1 (highest)

 Knowing when to stop ( AVOID over-fitting)


 Setting constraints on tree size (Tree defining parameters/Purning)
CART(Classification and Regression Tree)
Decision tree algorithms – ID3/C4.5/C5.0/CART

𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛
C4.5
Gain ratio =
𝑆𝑝𝑙𝑖𝑡 𝑖𝑛𝑓𝑜
Information gain
= Entropy(target) – (Weighted average) Entropy(children)
To arrive at the Split info
𝑆𝑖 𝑆𝑖
best attribute =-𝛴 ( ) (log( ) )
𝑆 𝑆

Gini gain
CART = Gini index (parent) – (weighted average)
Gini index(children)
𝑆𝑖 2
Gini index = 1 - 𝛴 ( )
𝑆
CART Example:
Gini gain = Gini index (parent) – (weighted average) Gini index(children)
𝑆𝑖
Gini index = 1 - 𝛴 ( 𝑆 )2 a1 (9)
Target
Index a1 a2 a3 Class(t)
Target (9)
1 T T 1.00 P T (4) F(5)
4 5 3 1 1 4
2 T T 6.00 P P N P N P N

3 T F 5.00 N
Gini index of target (t)
4 F F 4.00 P
Gini(t) = 1 – [(4/9)2 + (5/9)2] = 40/81

Gini index of attribute (a1)


5 F T 7.00 N
Gini(a1 = T) = 1 – [(3/4)2 + (1/4)2] = 3/8
Gini(a1 = F) = 1 – [(1/5)2 + (4/5)2] = 8/25
6 F T 3.00 N
Gini gain of a1
GiniGain(a1) = Gini(t) – [(4/9)Gini(a1=T) + (5/9)Gini(a1=F)]
7 F F 8.00 N = 0.149

8 T F 7.00 P Gini gain of a2 = 0.005

9 F T 5.00 N
CART Example contd…
a3 (9)

Split Target Class


Index a3 positions (t)
>= 4.50(6) < 4.50 (3)
1 1.00 0.50 P 2 4 2 1
P N P N
2 3.00 2.00 N

3 4.00 3.50 P

4 5.00 4.50 N Gini index of a3 (4.50)


Gini(a3 >= 4.50) = 1 – [(2/6)2 + (4/6)2] = 16/36
5 5.00 4.50 N Gini(a3 < 4.50) = 1 – [(2/3)2 + (1/3)2] = 4/9
Gini gain of a3 (4.50)
6 6.00 5.50 P GiniGain(a3 = 4.50)
= Gini(t) – [(6/9)Gini(a3>=4.50) + (3/9)Gini(a3<4.50)]
7 7.00 6.50 N = 0.049
8 7.00 6.50 P

9 8.00 7.50 N
CART Example contd… Graph generated using SciKit

Gini gain of a1 = 0.149


Gini gain of a2 = 0.005
Gini gain of a3 = 0.077

Split value 0.5 2.0 3.5 4.5 5.5 6.5 7.5

Gini Gain 0.00 0.077 0.002 0.049 0.005 0.012 0.049


C4.5 Example
𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 𝑆𝑖 𝑆𝑖
Gain ratio = Split info = - 𝛴 ( ) (log( ) )
𝑆𝑝𝑙𝑖𝑡 𝑖𝑛𝑓𝑜 𝑆 𝑆

 Attributes: Outlook, Humidity and Wind, Label: Decision

Wind (14)) Outlook (14)


Decision (14)
5 9 Weak (8) Strong (6) Sunny(5) Overcast (4) Rain (5)
No Yes 2 6 3 3 3 2 0 4 2 3
Entropy (Target) : No Yes No Yes No Yes No Yes No Yes
= (5/14)(log2(5/14) – (9/14)(log2(9/14))
= 0.940 Wind Gain ratio = 0.049 Outlook:
Entropy
Sunny: -(3/5)(log2(3/5)) – (2/5) (log2(2/5)) = 0.970
Overcast: = 0
Rain = 0.970
Humidity 85 90 78 96 80 70 65 95 70 80 70 90 75 80
Decision No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No Weighted average entropy
= ((5/14)(0.970)) + ((4/14)(0)) + ((5/14)(0.970)
Humidity 65 70 70 70 75 78 80 80 80 85 90 90 95 96
Decision Yes No Yes Yes Yes Yes Yes Yes No No No Yes No Yes Gain
Split 32.5 67.5 67.5 67.5 72.5 76.5 79 79 79 82.5 87.5 87.5 92.5 95.5
= 0.940 - Weighted average entropy = 0.247
Humidity:
 Arrange the data is ascending order SplitInfo
 Find the splits = : -(5/14)(log2(5/14)) – (4/14) (log2(4/14)) – (5/14) (log2(5/14))
 Calculate gain ratio for each split = 1.577
 Take the maximum gain ratio = 0.11
Gain Ration = Gain/SplitInfo = 0.247/1.577 = 0.156
CART(Classification and Regression Tree)

CART C4.5
Key idea Recursive binary partitioning (greedy approach)

Splits Binary tree Binary/Multiway tree

Data type nominal and numeric attributes

Splitting criteria GINI index Gain ratio

Pruning Cost – complexity pruning Pessimistic prediction or sub –tree raising

Handling missing values Surrogating tests Apportions values probability among


outcomes
Overfitting
 Happens due to outliers and irregularities in data
 Algorithm continues to go deeper and deeper in tree to reduce training error but results with
increase test set error

 Avoid overfitting
 Setting constraints on tree size
 By setting constraints of tree defining parameters
 Minimum samples for a node split
 Too high values may result in under-fitting. Use cross validation to tune
 Minimum samples for a terminal node
 Maximum depth of tree
 Maximum number of terminal nodes
 Maximum features to consider for split
 Randomly selected. As a thumb rule, square root of the total number of
features
 Pruning
CART Pruning
 Cost-complexity pruning(post-pruning algorithm)
Function returning set
Training/Learning Misclassification
of leaves of tree T Sum of
error rate misclassification
Regularization parameter errors
(set by cross validation)

 Algorithm  Choosing α
 Divide S into k subsets S0 , …, Sk
 In fold –
 Train a tree
 For each αk , prune the tree to that level and
measure the error rate
 Compute the average error rate over k folds
 Choose the αk that minimizes error rate. Call it α*
 Prune the original tree according to α*
CART Pruning example
t1 t2 t3

8/16 * 16/16 4/12 * 12/16 = 4/16 2/6*6/16 = 2/16


R(t)
[(1-max(8/16,8/16) ) = 8/16] [(1-max(4/12,8/12)) = 4/12] [(1-max(4/6,2/6)) = 2/6]

0
R(Tt) The entire tree, all leaves are 0 0
pure)

g(t) (8/16−0)/(4−1) = 1/6 (4/16−0)/(3−1) = 1/8 (2/16−0)/(2−1) = 1/8

R(t) 8/16 * 16/16 4/12 * 12/16

R(Tt) 2/16 2/16

g(t) (8/16−2/16)/(3−1) = 6/32 (4/16−2/16)/(2−1) = 1/8

R(t) 8/16 * 16/16

R(Tt) 4/16

g(t) (8/16 - 4/16)/(2-1) = 1/4


C4.5 Pruning
Pessimistic prediction
Corrected misclassification for a node

Corrected misclassification for a sub tree

Standard error for number of mis-classifications

Number of corrected misclassifications at


Node 26, n’(T) = 15 +1/2 = 15.5
For sub tree n’(T) = (2+0+6+2) +4/2 = 12
SE = Sqrt((12X(35-12))/35) = 2.8
Since 12 + 2.8 = 14.8 < 15.5, the sub tree should be
kept and not pruned
Comparison of CART and C4.5 with other algorithms
In this study a dataset about Labor Relations from http://archive.ics.uci.edu/ml/datasets.html web page was
classified by using Weka Application.

Detailed information can be found on https://archive.ics.uci.edu/ml/datasets/Labor+Relations


Comparison of CART and C4.5 with other algorithms
Comparison of CART and C4.5 with other algorithms
IRIS data set – scikit learn
THANK YOU!

You might also like