0% found this document useful (0 votes)
11 views16 pages

2 - 4 Cart

The document discusses the Gini Index and entropy as criteria for decision tree algorithms, highlighting that Gini is faster to compute while entropy may yield slightly better accuracy. It explains the process of constructing decision trees using the CART algorithm, which includes selecting attributes based on Gini or standard deviation reduction. The document also covers regression trees, detailing the steps for splitting nodes and determining when to stop further branching based on the coefficient of deviation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views16 pages

2 - 4 Cart

The document discusses the Gini Index and entropy as criteria for decision tree algorithms, highlighting that Gini is faster to compute while entropy may yield slightly better accuracy. It explains the process of constructing decision trees using the CART algorithm, which includes selecting attributes based on Gini or standard deviation reduction. The document also covers regression trees, detailing the steps for splitting nodes and determining when to stop further branching based on the coefficient of deviation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Gini vs Entropy GINI (t ) 1  j

[ p ( j | t )] 2

•entropy is more complex since it makes use of logarithms and consequently,


the calculation of the Gini Index will be faster.
• Accuracy using the entropy criterion are slightly better (not always).

C1 1 P(C1) = 1/6 P(C2) = 5/6


C2 5 Gini = 1 – (1/6)2 – (5/6)2 = 0.278

C1 2 P(C1) = 2/6 P(C2) = 4/6


C2 4 Gini = 1 – (2/6)2 – (4/6)2 = 0.444
Example: Construct Decision tree using CART

attribute A will be
chosen to split the
node as
Gini(A)<Gini(B)
Decision tree using CART
algorithm
Day Outlook Temp. Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
3
Selecting the Best Attribute for splitting
using Gini
S=[9+,5
-]
Outlook

Ove
Sunn Rai
r
y n
cast
[2+, [4+, [3+,
3-] 0] 2-]
Outlook will be placed at root as it has
high information gain (low Gini index).
Regression trees
• Regression trees are trees where their leaves predict a real number
and not a class.
• Example CART
• CART stands for Classification and Regression Trees. An important
feature of CART is its ability to generate regression trees
• CART looks for splits that minimize the prediction squared error (the
least–squared deviation). The prediction in each leaf is based on the
weighted mean for node.
Algorithm
• Step 1: The standard deviation of the target is calculated.
• Step 2: The dataset is then split on the different attributes. The standard deviation for
each branch is calculated. The resulting standard deviation is subtracted from the
standard deviation before the split. The result is the standard deviation reduction.
• Step 3:The attribute with the largest standard deviation reduction is chosen for the
decision node.
• Step 4a: The dataset is divided based on the values of the selected attribute. This process
is run recursively on the non-leaf branches, until all data is processed.
• Repeat all the steps until when coefficient of deviation (CV) for a branch becomes
smaller than a certain threshold (e.g., 10%) and/or when too few instances in a
branch
CART Example
• Hours played is target (continuous outcome) -- regression
a) Standard deviation for target:
• Compute SD for outlook

Rainy : 25, 30, 35, 38, 48


Mean = 35.2
Std. Dev = [(25-35.2)^2 + (30-
35.2)^2 + (38-35.2)^2 + (48-
35.2)^2 ]/ 5
= 7.78

The standard deviation reduction = Std. dev (Hours)- Std. dev (Hours, outlook))
= 9.32- 7.66 = 1.66
• In practice, we need some termination criteria. For example, when coefficient of
deviation (CV) for a branch becomes smaller than a certain threshold (e.g., 10%)
and/or when too few instances (n) remain in the branch (e.g., 3).

3.49 / 46.3 = 7.53% = 8%

(46+43+52+44)/4 = 46.25 =
"Overcast" subset does not need any further splitting because 46.3
its CV (8%) is less than the threshold (10%). The related leaf
node gets the average of the "Overcast" subset.
• The "Sunny" branch has an CV (28%) more than the threshold (10%) which needs
further splitting. We select "Temp" as the best best node after "Outlook" because
it has the largest SDR.
• Because the number of data points for both branches (FALSE and TRUE) is equal
or less than 3, we stop further branching and assign the average of each branch
to the related leaf node.
• Moreover, the “Rainy" branch has an CV (22%) which is more than the threshold (10%).
This branch needs further splitting. We select "Temp" as the best best node because it
has the largest SDR.
• Because the number of data points for all three branches (Cool, Hot and Mild) is equal or
less than 3 we stop further branching and assign the average of each branch to the
related leaf node.

You might also like