2 ML Ch3 Decision Trees Final
2 ML Ch3 Decision Trees Final
Lecture Slides 2
n In these slides;
¨ you will strengthen your understanding of the introductory
concepts of machine learning
¨ learn about the Decision tree approach which is one of the
fundamental approaches in machine learning
n Decision trees are also the basis of a new method called
“random forest” which we will see towards the end of the
course.
n Note that you can skip Advanced marked slides, they are
extra information
Decision Trees
n One of the most widely used and practical methods for
inductive inference
Current Debt?
n Leaves
¨ Classification: Class labels, or proportions
¨ Regression: Numeric; r average, or local fit
n AB + AC + AD + BC + BD + CD
Decision tree learning algorithm
n For a given training set, there are many trees that code it
without any error
(25+,25-)
(25+,25-)
A1 < …?
A2 < …?
t f t f
n Show the high school form example with gender field
Entropy of a Binary Random Variable
Entropy of a Binary Random Variable
n Entropy measures the impurity of S:
Entropy(S) = -p x log2 p +
- (1-p) x log2 (1-p)
1 1
Entropy(X) = 0.1× lg + (1 − 0.1)× lg
0.1 (1 − 0.1)
Entropy – General Case
n When the random variable has multiple possible outcomes, its
entropy becomes:
We would select the Humidity attribute to split the root node as it has a higher
Information Gain.
Selecting the Next Attribute
n Computing the information gain for each attribute, we selected the Outlook
attribute as the first test, resulting in the following partially learned tree:
n We can repeat the same process recursively, until Stopping conditions are
satisfied.
Partially learned tree
Until stopped:
n Select one of the unused attributes to partition the
remaining examples at each non-terminal node using
only the training samples associated with that node
Stopping criteria:
n each leaf-node contains examples of one type
n algorithm ran out of attributes
n …
Advanced: Other measures of impurity
n Entropy is not the only measure of impurity. If a function satisfies
certain criteria, it can be used as a measure of impurity.
n For Binary random variables (only p value is indicated), we have:
¨ Gini index: 2p(1-p)
n p=0.5 Gini Index=0.5
n p=0.9 Gini Index=0.18
n p=1 Gini Index=0
n p=0 Gini Index=0
Continuous Values
Missing Attributes
…
Continuous Valued Attributes
n Create a discrete attribute to test continuous variables
Temperature = 82:5
(Temperature > 72:3) = t; f
Temperature: 40 48 60 72 80 90
PlayTennis: No No Yes Yes Yes No
Incorporating continuous-valued attributes
n Where to cut?
¨ We can show that the threshold is always the the transitions
(shown in red boundaries between the two classes)
Continuous valued
attribute
Advanced: Split Information?
n In each tree, the leaves contain samples of only one kind
(e.g. 50+, 10+, 10- etc).
¨ Hence, the remaining entropy is 0 in each one.
100 examples
100 examples
A2
A
10 positive
G ain (S , A )
G ainR atio(S , A ) =
SplitInformation (S , A )
c
Si Si
SplitInformation (S , A ) = -‐ Â
Σ
i= 1 S
lg
S
Wind
Strong
Yes No
n ID3 (the Greedy algorithm that was outlined) will make a new split
and will classify future examples following the new path as negative.
¨ Pruning: replacing a subtree with a leaf with the most common
classification in the subtree.
¨ …
Reduced-Error Pruning (Quinlan 1987)
n Split data into training and validation set
¨ Sort the pruned rules by accuracy and use them in that order.
¨ One feature where the input space is divided into 3 bins: