Decision Tree
Decision Tree
m=2
10
Attribute Selection Measure: Information
Gain (ID3/C4.5)
■ Select the attribute with the highest information gain
■ Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
■ Expected information (entropy) needed to classify a tuple in D:
11
Example
Attribute Selection: Information Gain
g Class P: buys_computer = “yes”
g Class N: buys_computer = “no”
Similarly,
13
Resulted Decision Tree
Overfitting and Tree Pruning
• Overfitting: An induced tree may overfit the training data
• Too many branches, some may reflect anomalies due to
noise or outliers
• Poor accuracy for unseen samples
• Two approaches to avoid overfitting
• Prepruning: Halt tree construction early ̵ do not split a node if
this would result in the goodness measure falling below a
threshold
• Difficult to choose an appropriate threshold
• Postpruning: Remove branches from a “fully grown”
tree—get a sequence of progressively pruned trees
• Use a set of data different from the training data to decide which is
the “best pruned tree”
15
If-Then Rules
• Extracting Classification Rules from Trees
• Goal: Represent the knowledge in the form of IF-THEN determinant
rules
• One rule is created for each path from the root to a leaf;
• Each attribute-value pair along a path forms a conjunction;
• The leaf node holds the class prediction
• Rules are easier to understand
If- Then Rules
Excercise