2 - 4 Cart
2 - 4 Cart
[ p ( j | t )] 2
attribute A will be
chosen to split the
node as
Gini(A)<Gini(B)
Decision tree using CART
algorithm
Day Outlook Temp. Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
3
Selecting the Best Attribute for splitting
using Gini
S=[9+,5
-]
Outlook
Ove
Sunn Rai
r
y n
cast
[2+, [4+, [3+,
3-] 0] 2-]
Outlook will be placed at root as it has
high information gain (low Gini index).
Regression trees
• Regression trees are trees where their leaves predict a real number
and not a class.
• Example CART
• CART stands for Classification and Regression Trees. An important
feature of CART is its ability to generate regression trees
• CART looks for splits that minimize the prediction squared error (the
least–squared deviation). The prediction in each leaf is based on the
weighted mean for node.
Algorithm
• Step 1: The standard deviation of the target is calculated.
• Step 2: The dataset is then split on the different attributes. The standard deviation for
each branch is calculated. The resulting standard deviation is subtracted from the
standard deviation before the split. The result is the standard deviation reduction.
• Step 3:The attribute with the largest standard deviation reduction is chosen for the
decision node.
• Step 4a: The dataset is divided based on the values of the selected attribute. This process
is run recursively on the non-leaf branches, until all data is processed.
• Repeat all the steps until when coefficient of deviation (CV) for a branch becomes
smaller than a certain threshold (e.g., 10%) and/or when too few instances in a
branch
CART Example
• Hours played is target (continuous outcome) -- regression
a) Standard deviation for target:
• Compute SD for outlook
The standard deviation reduction = Std. dev (Hours)- Std. dev (Hours, outlook))
= 9.32- 7.66 = 1.66
• In practice, we need some termination criteria. For example, when coefficient of
deviation (CV) for a branch becomes smaller than a certain threshold (e.g., 10%)
and/or when too few instances (n) remain in the branch (e.g., 3).
(46+43+52+44)/4 = 46.25 =
"Overcast" subset does not need any further splitting because 46.3
its CV (8%) is less than the threshold (10%). The related leaf
node gets the average of the "Overcast" subset.
• The "Sunny" branch has an CV (28%) more than the threshold (10%) which needs
further splitting. We select "Temp" as the best best node after "Outlook" because
it has the largest SDR.
• Because the number of data points for both branches (FALSE and TRUE) is equal
or less than 3, we stop further branching and assign the average of each branch
to the related leaf node.
• Moreover, the “Rainy" branch has an CV (22%) which is more than the threshold (10%).
This branch needs further splitting. We select "Temp" as the best best node because it
has the largest SDR.
• Because the number of data points for all three branches (Cool, Hot and Mild) is equal or
less than 3 we stop further branching and assign the average of each branch to the
related leaf node.