0% found this document useful (0 votes)

27 views

decision_tree_learning_lecture

The document discusses decision tree learning, a method for approximating discrete-valued target functions using a tree structure where internal nodes test attributes and leaf nodes provide classifications. It details the ID3 algorithm for constructing decision trees, emphasizing the importance of information gain and entropy in selecting attributes for classification. Additionally, it addresses practical issues in decision tree learning, such as overfitting, handling continuous attributes, and incorporating costs associated with attributes.

Uploaded by

Nehal 123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

decision_tree_learning_lecture

Uploaded by

Nehal 123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ECOE554 Machine Learning Fall Term, 2003

Lecture 10: November 3, 2003

Lecturer: Deniz Yuret Scribe: Basak Mutlum

1 Introduction
Decision tree learning is one of the most widely used and practical methods
for inductive inference.It is a method for approximating discrete-valued target
functions, in which the learned function is represented by a decision tree.

Figure 1: A decision tree for the concept PlayTennis.

2 Decision Tree Representation

Decision trees classify instances by sorting them down the tree from the root
to some leaf node, which provides the classification of the instance.In the
decision tree representation:

• Each internal node tests an attribute

• Each branch corresponds to attribute value

1
3 APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING 2

• Each leaf node assigns a classification

How would we represent :
• ∧, ∨, XOR

• ( A ∧ B ) ∨ ( C ∧¬ D ∧ E )

• M of N
Decision trees represent a disjunction of conjunctions.
• Each path from root to a leaf is a conjunction of attribute tests.
• The tree itself is a disjunction of these conjunctions.
Figure1 illustrates a typical learned decision tree.It corresponds to the ex-
pression
( Outlook = Sunny ∧ Humidity = Normal )
∨
( Outlook = Overcast )
∨
( Outlook = Rain ∧ Wind = Weak )

3 Appropriate Problems for Decision Tree Learn-

ing
Decision tree learning is generally best suited to problems with the following
characteristics:
• Instances describable by attribute-value pairs.
• Target function is discrete valued.
• Disjunctive hypothesis may be required.
• Possibly noisy training data.
Some examples of problems that fit to these characteristics are:
• Medical or equipment diagnosis
• Credit risk analysis
• Modelling calendar scheduling preferences
4 THE BASIC DECISION TREE LEARNING ALGORITHM 3

4 The Basic Decision Tree Learning Algorithm

The basic decision tree learning algorithm, ID3, employs a top-down, greedy
search through the space of possible decision trees, beginning with the ques-
tion ”which attribute should be tested at the root of the tree? ”.The algorithm
is given below:

ID3(Examples, Target-attribute, Attributes)

/* Examples: The training examples; */
/* Target-attribute:The attribute whose value is to be predicted by the tree; */
/* Attributes: A list of other attributes that may be tested by the learned decision tree. */
/* Return a decision tree that correctly classifies the given Examples */
Step 1: Create a Root node for the tree
Step 2: If all Examples are positive, Return the single-node tree Root,with label = +
Step 3: If all Examples are negative, Return the single-node tree Root,with label = -
Step 4: If Attributes is empty, Return the single-node tree Root, with label = most common value of
Target-attribute in Examples
Step 5: Otherwise Begin

• A ← the attribute from Attributes that best (i.e., highest information gain) classifies Examples;

• The decision attribute for Root ← A;

• For each possible value, vi , of A,

– Add a new tree branch below Root, corresponding to the test A=vi ;
– Let Examples( vi) be the subset of Examples that have value vi for A;
– If Examples( vi) is empty
∗ Then below this new branch add a leaf node with label = most common value of
Target-attribute in Examples
∗ Else below this new branch add the subtree
ID3(Examples( vi) , Target-attribute, Attributes- A ))

End
Return Root

We want to select the attribute that is most useful for classifying exam-
ples.In order to measure the worth of an attribute a statistical property is
defined, information gain, which measures how well a given attribute sep-
arates the training examples according to their target classification.

4.1 Entropy
In order to define information gain precisely,we begin by defining a measure
called entropy , that characterizes the (im)purity of an arbitrary collection
of examples. That is, it measures the homogeneity of examples.
4 THE BASIC DECISION TREE LEARNING ALGORITHM 4

Entropy(S) ≡ −p⊕ log2 p⊕ − pª log2 pª

S is a sample of training examples
p⊕ is the proportion of positive examples in S
pª is the proportion of negative examples in S

4.2 Information Gain

Information gain is simply the expected reduction in entropy caused by par-
titioning the examples according to this attribute.

X |Sv |
Gain(S, A) ≡ Entropy(S) − Entropy(Sv )
v∈V alues(A)
|S|

Values(A) is the set of all possible values for attribute A

Sv is the subset os S for which attribute A has value v

4.3 An Illustrative Example

To illustrate the operation of ID3, let’s consider the learning task represented
by the below examples.

Day Outlook Temperature Humidity Wind PlayTennis

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
4 THE BASIC DECISION TREE LEARNING ALGORITHM 5

In the first step of the algorithm ,the topmost node of the decision tree
is created.In order to determine the attribute that should be tested first in
the tree,the information gain for attributes (Outlook , Temperature ,Humid-
ity and Wind ) are determined. The computation of information gain for
Humidity and Wind is shown below.

Figure 2: Information gain for Humidity and Wind.

The information gain for all four attributes are:

Gain(S,Outlook) = 0.246
Gain(S,Humidity) = 0.151
Gain(S,Wind) = 0.048
Gain(S,Temperature) = 0.029

Since Outlook attribute provides the best prediction, it is selected as the

decision attribute for the root node.

In Figure-3,the partially learned decision tree resulting from the first step
of ID3 is shown. The final decision tree learned by ID3 from the 14 training
examples is shown in Figure-1.
5 HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING 6

Figure 3: The partially learned decision tree resulting from the first step of
ID3.

5 Hypothesis Space Search in Decision Tree

Learning
The hypothesis space searched by ID3 is the set of possible decision trees.
ID3 performs a hill-climbing search through the space of possible decision
trees from simplest to increasingly complex, guided by the information gain
heuristic. This search is depicted in Figure-4.
Capabilities and limitations of ID3:

• Hypothesis space is complete!

Target function is surely in there...

• Outputs a single hypothesis (which one?)

Can’t play 20 questions...
6 INDUCTIVE BIAS IN DECISION TREE LEARNING 7

• No back tracking
Local minima...

• Statistically-based search choices

Robust to noisy data...

• Inductive bias: approx ”prefer shortest tree”

Figure 4: Hypothesis space search by ID3.

6 Inductive Bias in Decision Tree Learning

Note H is the power set of instances X.
→ Unbiased?
Not really...

• Preference for short trees over larger trees, and for those with high
information gain attributes near the root
7 ISSUES IN DECISION TREE LEARNING 8

• Bias is a preference for some hypotheses, rather than a restriction of

hypothesis space H
• Occam’s razor prefer the shortest hypothesis that fits the data

6.1 Why Prefer Short Hypothesis?

Is ID3’s inductive bias favoring shorter decision trees a sound basis for gen-
eralizing beyond the training data?

Occam’s razor : Prefer the simplest hypothesis that fits the data.
Why prefer short hypotheses?

Arguments in favor:
• Fewer short hypotheses than long hypotheses
– a short hypothesis that fits data is unlikely to be a coincidence
– a long hypothesis that fits data might be a coincidence

Arguments opposed:
• There are many ways to define small sets of hypotheses
• e.g., all trees with a prime number of nodes that use attributes begin-
ning with ”Z”
• What’s so special about small sets based on size of hypothesis?

7 Issues in Decision Tree Learning

Practical issues in learning decision trees include:
• Determining how deeply to grow the decision tree.
• Handling continues attributes.
• Choosing an appropriate attribute selection measure.
• Handling training data with missing attribute values.
• Handling attributes with differing costs.
• Improving computational efficiency.
7 ISSUES IN DECISION TREE LEARNING 9

7.1 Avoiding Overfitting the Data

When there is noise in the data, or when the number of training examples
is too small to produce a representative sample of the true target function,
overfit the training examples!!

Consider error of hypothesis h over

• training data: errortrain (h)

• entire distribution D of data: errorD (h)

Hypothesis h∈H OVERFITS training data if there is an alternative

hypothesis h0 ∈H such that
errortrain (h) < errortrain (h0 )
and
errorD (h) > errorD (h0 )

Consider adding a noisy training example #15 to the training examples

we considered before:
Sunny, Hot, Normal, Strong, PlayTennis = No
What effect does it have on the earlier tree?

ID3 outputs a decision tree (h) that is more complex than the original tree
from Figure-1(h0 ). h fits the training examples perfectly,whereas the simpler
h0 will not.

Figure-5 illustrates the impact of overfitting in a typical application of deci-

sion tree learning.
How can we avoid overfitting?
• stop growing when data split not statistically significant

• grow full tree, then post-prune

How to select ”best” tree?

• Measure performance over training data

• Measure performance over separate validation data set

• MDL: minimize size(tree) + size(misclassifications(tree))

7 ISSUES IN DECISION TREE LEARNING 10

Figure 5: Overfitting in decision tree learning

7.1.1 Reduced Error Pruning

How exactly might we use a validation set to prevent overfitting?One ap-
proach is reduced-error pruning. Pruning a node consists of:
• removing the subtree rooted at that node,
• making it a leaf,
• and assigning it the most common classification of the training exam-
ples affiliated with that node.
Reduced-error pruning produces the smallest version of the most accurate
subtree. The impact of reduced-error pruning on the accuracy of the decision
tree is illustrated in Figure-6.
• Training set: learn the tree
• Validation set: prune the tree
• Test set: estimate future classification accuracy
Split data into training and validation set.
Do until further pruning is harmful:
1. Evaluate impact on validation set of pruning each possible node (plus
those below it)
2. Greedily remove the one that most improves validation set accuracy
7 ISSUES IN DECISION TREE LEARNING 11

Figure 6: Effect of reduced-error pruning in decision tree learning.

7.1.2 Rule Post-Pruning

Rule-post pruning involves the following steps:
1. Infer the decision tree from the training set.
2. Convert the tree to equivalent set of rules.
3. Prune each rule independently of others.
4. Sort final rules into desired sequence for use.
In rule post-pruning, one rule is generated for each leaf node in the
tree.For example, the two leftmost path of the tree in Figure-1 is translated
into the rules:

IF (Outlook = Sunny) ∧ (Humidity = High) THEN PlayTennis = No

IF (Outlook = Sunny) ∧ (Humidity = Normal ) THEN PlayTennis = Yes

Advantages of converting decision tree to rules before pruning:

• allows distinguishing among the different contexts in which a decision
node is used
• removes the distinction between attribute tests that occur near the root
of the tree and those that occur near the leaves
• improves readibility
7 ISSUES IN DECISION TREE LEARNING 12

7.2 Incorporating Continuous-Valued Attributes

Create a discrete attribute to test continuous
• Temperature = 82.5
• ( Temperature > 72.3 ) = t,f
Temperature: 40 48 60 72 80 90
PlayTennis: No No Yes Yes Yes No

7.3 Attributes with Many Values

Problem:
• If attribute has many values, Gain will select it
• Imagine using Date = Jun-3-1996 as attribute
One approach: use GainRatio instead

Gain(S, A)
GainRatio(S, A) ≡
SplitInf ormation(S, A)

c
X |Si | |Si |
SplitInf ormation(S, A) ≡ − log2
i=1 |S| |S|
where Si is subset of S for which A has value vi

7.4 Unknown Attribute Values

What if some examples missing values of A?
Use training example anyway, sort through tree

• If node n tests A, assign most common value of A among other examples

sorted to node n
• assign most common value of A among other examples with same target
value
• assign probability pi to each possible value vi of A
– assign fraction pi of example to each descendant in tree

Classify new examples in the same fashion.

8 REFERENCES 13

7.5 Attributes with Costs

Consider

• medical diagnosis, BloodTest has cost $150

• robotics, Width-from-1ft has cost 23 sec.

How to learn a consistent tree with minimum expected cost?

One approach: replace gain by

• Tan and Schlimmer (1990)

Gain2 (S, A)
Cost(A)

• Nunez (1988)
2Gain(S,A) − 1
(Cost(A) + 1)w

where w ∈ [0,1] determines importance of cost.

8 References
1. Mitchell, Tom.Machine Learning:Decision Tree Learning (Chapter 3),
The MIT Press, 1997.
2. J.R.Quinlan. Induction of Decision Trees,August 1,1985
3. Online. http://www.comp.hkbu.edu.hk/∼ymc/course/sci3790 0304/chapter4.pdf
(Lecture Notes of Decision Tree Learning-ID3 Algorithm)
c
4.Lecture slides for textbook Machine Learning,°Tom M. Mitchell, McGraw
Hill,1997
5.Online.A simplified ID3 implementation in C++ and Java:
http://www.ida.his.se/ida/kurser/ai-symbolsystem/kursmaterial/archive/assignments/
assignment3/id3.html
6.Online.A complete, java implementation of the decision-tree algorithm:
http://www.cogs.susx.ac.uk/users/christ/crs/sai/lec15.html
7.Online.Machine Learning Algorithms in Java:
www.aifb.uni-karlsruhe.de/Lehre/Winter2002-03/ kdd/download/weka/Tutorial.ps
8.Online.C4.5 tutorial and implementation in C:
http://www2.cs.uregina.ca/ hamilton/courses/831/notes/ml/dtrees/c4.5/tutorial.html

Crime Rate Prediction
No ratings yet
Crime Rate Prediction
26 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Module 3
No ratings yet
Module 3
103 pages
ML UNIT 2 Decision Tree
No ratings yet
ML UNIT 2 Decision Tree
109 pages
ML - Unit 2 - Part I
No ratings yet
ML - Unit 2 - Part I
15 pages
Decision Tree Using ID3 Algorithm
No ratings yet
Decision Tree Using ID3 Algorithm
40 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
Mod 3 AIML QB With Answers
No ratings yet
Mod 3 AIML QB With Answers
26 pages
Unit 3
No ratings yet
Unit 3
46 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
Unit-3 MLT
No ratings yet
Unit-3 MLT
74 pages
MCA3 (DS) Unit 4 ML
No ratings yet
MCA3 (DS) Unit 4 ML
29 pages
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
ml unit 3 part 1
No ratings yet
ml unit 3 part 1
42 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Module 2 Notes
No ratings yet
Module 2 Notes
20 pages
ML Lecture 3
No ratings yet
ML Lecture 3
13 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Video Tutorial: Decision Tree Learning
No ratings yet
Video Tutorial: Decision Tree Learning
21 pages
Module - 3 - DTL & Ann
No ratings yet
Module - 3 - DTL & Ann
10 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
2025-Lecture07-P1-ID3
No ratings yet
2025-Lecture07-P1-ID3
41 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
81 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
DM Lect 9_Classification - Decision Trees
No ratings yet
DM Lect 9_Classification - Decision Trees
39 pages
2024-Lecture11-MLAlgorithms
No ratings yet
2024-Lecture11-MLAlgorithms
84 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Module 3 - Decision Tress and Artificial Neural Networks
No ratings yet
Module 3 - Decision Tress and Artificial Neural Networks
177 pages
Mitchell Dectrees PDF
No ratings yet
Mitchell Dectrees PDF
29 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Machine Learning: MVJ21CS62
No ratings yet
Machine Learning: MVJ21CS62
12 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
decision-trees-Parth-Gupta
No ratings yet
decision-trees-Parth-Gupta
22 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
module 2
No ratings yet
module 2
42 pages
Module 2 Notes v1 PDF
No ratings yet
Module 2 Notes v1 PDF
20 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
ML UNIT 1 - Decision Tree
No ratings yet
ML UNIT 1 - Decision Tree
7 pages
03 02 Decision Trees (1)
No ratings yet
03 02 Decision Trees (1)
61 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
Springer.linguistic Decision Trees for Classification-2014
No ratings yet
Springer.linguistic Decision Trees for Classification-2014
43 pages
LINFO2262: Decision Trees + Random Forests: Pierre Dupont
No ratings yet
LINFO2262: Decision Trees + Random Forests: Pierre Dupont
43 pages
Unit IV Notes
No ratings yet
Unit IV Notes
20 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Module 3-1 PDF
No ratings yet
Module 3-1 PDF
43 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
AIML- Module 3- Updated
No ratings yet
AIML- Module 3- Updated
42 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
21 pages
Session 17-Decision Tree
No ratings yet
Session 17-Decision Tree
16 pages
Unit II Part 1
No ratings yet
Unit II Part 1
62 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Towards Pentesting Automation Using The Metasploit Framework
No ratings yet
Towards Pentesting Automation Using The Metasploit Framework
8 pages
ML Interview Ques
No ratings yet
ML Interview Ques
12 pages
[1.2]
No ratings yet
[1.2]
58 pages
Decision Trees Palagraism
No ratings yet
Decision Trees Palagraism
16 pages
Unit-3
No ratings yet
Unit-3
157 pages
Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Machine learning_question bank
No ratings yet
Machine learning_question bank
45 pages
Machine Learning Practical
No ratings yet
Machine Learning Practical
59 pages
CCST9017 (2023-24lecture11printed Version) MachineLearning
No ratings yet
CCST9017 (2023-24lecture11printed Version) MachineLearning
55 pages
Compendium Iim Shillong Analytics and Prod Man
No ratings yet
Compendium Iim Shillong Analytics and Prod Man
68 pages
Decision Tree Class 1
No ratings yet
Decision Tree Class 1
34 pages
Unit 3 Data Warehousing and Data Mining
No ratings yet
Unit 3 Data Warehousing and Data Mining
7 pages
Lab Master Record On
No ratings yet
Lab Master Record On
42 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
49 pages
Machine Learning With Decision Trees and Random Forest ?
No ratings yet
Machine Learning With Decision Trees and Random Forest ?
31 pages
DATA MINING MODULE 3
No ratings yet
DATA MINING MODULE 3
27 pages
Decision Trees- Id3 Algorithms
No ratings yet
Decision Trees- Id3 Algorithms
12 pages
ML Unit I
No ratings yet
ML Unit I
43 pages
DSV Sem Exam
No ratings yet
DSV Sem Exam
15 pages
Detecting Malicious Websites Using Machine Learning
No ratings yet
Detecting Malicious Websites Using Machine Learning
58 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Attribute Selection Measures
No ratings yet
Attribute Selection Measures
15 pages
Decision Tree
No ratings yet
Decision Tree
36 pages
SVM,KNN,TreeNBC
No ratings yet
SVM,KNN,TreeNBC
22 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Machine Learning Interview Questions & Answers for Data Scientists
No ratings yet
Machine Learning Interview Questions & Answers for Data Scientists
13 pages

decision_tree_learning_lecture

Uploaded by

decision_tree_learning_lecture

Uploaded by

ECOE554 Machine Learning Fall Term, 2003

Lecture 10: November 3, 2003

Figure 1: A decision tree for the concept PlayTennis.

2 Decision Tree Representation

• Each internal node tests an attribute

• Each branch corresponds to attribute value

• Each leaf node assigns a classification

3 Appropriate Problems for Decision Tree Learn-

4 The Basic Decision Tree Learning Algorithm

ID3(Examples, Target-attribute, Attributes)

• The decision attribute for Root ← A;

• For each possible value, vi , of A,

Entropy(S) ≡ −p⊕ log2 p⊕ − pª log2 pª

4.2 Information Gain

Values(A) is the set of all possible values for attribute A

4.3 An Illustrative Example

Day Outlook Temperature Humidity Wind PlayTennis

Figure 2: Information gain for Humidity and Wind.

The information gain for all four attributes are:

Since Outlook attribute provides the best prediction, it is selected as the

5 Hypothesis Space Search in Decision Tree

• Hypothesis space is complete!

• Outputs a single hypothesis (which one?)

• Statistically-based search choices

• Inductive bias: approx ”prefer shortest tree”

Figure 4: Hypothesis space search by ID3.

6 Inductive Bias in Decision Tree Learning

• Bias is a preference for some hypotheses, rather than a restriction of

6.1 Why Prefer Short Hypothesis?

7 Issues in Decision Tree Learning

7.1 Avoiding Overfitting the Data

Consider error of hypothesis h over

• entire distribution D of data: errorD (h)

Hypothesis h∈H OVERFITS training data if there is an alternative

Consider adding a noisy training example #15 to the training examples

Figure-5 illustrates the impact of overfitting in a typical application of deci-

• grow full tree, then post-prune

How to select ”best” tree?

• Measure performance over training data

• Measure performance over separate validation data set

• MDL: minimize size(tree) + size(misclassifications(tree))

Figure 5: Overfitting in decision tree learning

7.1.1 Reduced Error Pruning

Figure 6: Effect of reduced-error pruning in decision tree learning.

7.1.2 Rule Post-Pruning

IF (Outlook = Sunny) ∧ (Humidity = High) THEN PlayTennis = No

IF (Outlook = Sunny) ∧ (Humidity = Normal ) THEN PlayTennis = Yes

Advantages of converting decision tree to rules before pruning:

7.2 Incorporating Continuous-Valued Attributes

7.3 Attributes with Many Values

7.4 Unknown Attribute Values

• If node n tests A, assign most common value of A among other examples

Classify new examples in the same fashion.

7.5 Attributes with Costs

• medical diagnosis, BloodTest has cost $150

• robotics, Width-from-1ft has cost 23 sec.

How to learn a consistent tree with minimum expected cost?

• Tan and Schlimmer (1990)

where w ∈ [0,1] determines importance of cost.

You might also like