0% found this document useful (0 votes)

57 views41 pages

Ch02 DecisionTree

The document discusses decision tree learning and decision trees. It covers topics like decision tree induction algorithms ID3 and C4.5, attributes selection measures like information gain and gain ratio, handling numeric and missing values, and pruning decision trees. The key aspects of decision trees are that they consist of nodes that test attribute values, edges that connect nodes based on test outcomes, and leaves that make predictions. Algorithms like ID3 and C4.5 build decision trees from the top down by recursively splitting training examples into purer subsets.

Uploaded by

THINH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views41 pages

Ch02 DecisionTree

Uploaded by

THINH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Machine Learning

Decision Tree

Lecturer: Duc Dung Nguyen, PhD.

Contact: [email protected]

Faculty of Computer Science and Engineering

Hochiminh city University of Technology
Contents

1. Decision-Tree Learning

2. Decision-Trees

1
Decision-Tree Learning
Decision-Tree Learning

Introduction
• Decision Trees
• TDIDT: Top-Down Induction of Decision Trees
ID3
• Attribute selection
• Entropy, Information, Information Gain
• Gain Ratio
C4.5
• Numeric Values
• Missing Values
• Pruning
Regression and Model Trees 2
Decision-Trees
Decision-Trees

A decision tree consists of

• Nodes: test for the value of a certain attribute

• Edges: correspond to the outcome of a test and connect to the next node or leaf
• Leaves: terminal nodes that predict the outcome

3
Decision-Trees

4
Decision-Trees

5
Decision-Trees

6
Decision-Trees: Divide-And-Conquer Algorithms

Family of decision tree learning algorithms: TDIDT: Top-Down Induction of Decision Trees.
Learn trees in a Top-Down fashion:

• Divide the problem in subproblems.

• Solve each problem.

7
Decision-Trees: ID3 Algorithm

Function ID3

• Input: Example set S

• Output: Decision Tree DT

If all examples in S belong to the same class c, return a new leaf and label it with c. Else:

• Select an attribute A according to some heuristic function.

• Generate a new node DT with A as test.
• For each Value vi of A, let Si = all examples in S with A = vi . Use ID3 to construct a
decision tree DTi for example set Si . Generate an edge that connects DT and DTi

8
Decision-Trees: A Different Decision Tree

9
Decision-Trees: What is a good Attribute?

A good attribute prefers attributes that split the data so that each successor node is as pure as
possible.
In other words, we want a measure that prefers attributes that have a high degree of “order”:

• Maximum order: All examples are of the same class

• Minimum order: All classes are equally likely

→ Entropy is a measure for (un-)orderedness.

10
Decision-Trees: Entropy (for two classes)

• S is a set of examples
• p⊕ is the proportion of examples in class ⊕
• p = 1 − p⊕ is the proportion of examples in class

Entropy:
E(S) = −p⊕ log2 p⊕ − p log2 p (1)

11
Decision-Trees: Entropy (for two classes)

12
Decision-Trees: Entropy (for more classes)

Entropy can be easily generalized for n > 2 classes:

n
X
E(S) = −p1 log p1 − p2 log p2 ... − pn log pn = − pi log pi (2)
i=1

pi is the proportion of examples in S that belong to the i-th class.

13
Decision-Trees: Average Entropy / Information

Problem: Entropy only computes the quality of a single (sub-)set of examples.

Solution: Compute the weighted average over all sets resulting from the split weighted by
their size.
X |Si |
I(S, A) = E(Si ) (3)
i
|S|

14
Decision-Trees: Information Gain

When an attribute A splits the set S into subsets Si , we then compute the average entropy
and compare the sum to the entropy of the original set S.
Information Gain for Attribute A:
X |Si |
Gain(S, A) = E(S) − I(S, A) = E(S) − E(Si ) (4)
i
|S|

The attribute that maximizes the difference is selected.

15
Decision-Trees: Properties of Entropy

Entropy is the only function that satisfies all of the following three properties:

• When node is pure, measure should be zero.

• When impurity is maximal (i.e. all classes equally likely), measure should be maximal.
• Measure should obey multistage property.

16
Decision-Trees: Highly-branching attributes

Problematic: attributes with a large number of values.

Subsets are more likely to be pure if there is a large number of different attribute values.
Information gain is biased towards choosing attributes with a large number of values.
This may cause several problems:

• Overfitting: selection of an attribute that is non-optimal for prediction

• Fragmentation: data are fragmented into (too) many small sets.

17
Decision-Trees: Intrinsic Information of an Attribute

Intrinsic information of a split:

X |Si | |Si |
IntI(S, A) = − log (5)
i
|S| |S|

18
Decision-Trees: Gain Ratio

Modification of the information gain that reduces its bias towards multi-valued attributes.
Takes number and size of branches into account when choosing an attribute. Corrects the
information gain by taking the intrinsic information of a split into account.
Definition of Gain Ratio:
Gain(S, A)
GR(S, A) = (6)
IntI(S, A)

19
Decision-Trees: Gini Index

There are many alternative measures to Information Gain. Most popular altermative is Gini
index.
Impurity measure (instead of entropy):
X
Gini(S) = 1 − p2i (7)
i

Average Gini index (instead of average entropy / information):

X |Si |
Gini(S, A) = .Gini(Si ) (8)
i
|S|

Gini Gain could be defined analogously to information gain but typically avg. Gini index is
minimized instead of maximizing Gini gain.

20
Decision-Trees: Comparison among Splitting Criteria

21
Decision-Trees: Industrial-strength algorithms

For an algorithm to be useful in a wide range of real-world applications it must:

• Permit numeric attributes

• Allow missing values
• Be robust in the presence of noise
• Be able to approximate arbitrary concept descriptions (at least in principle)

22
Decision-Trees: Numeric attributes

Standard method: binary splits

Unlike nominal attributes, every attribute has many possible split points and computationally
more demanding.
Solution is straightforward extension:

• Evaluate info gain (or other measure) for every possible split point of attribute
• Choose “best” split point
• Info gain for best split point is info gain for attribute

23
Decision-Trees: Efficient Computation

Efficient computation needs only one scan through the values!

• Linearly scan the sorted values, each time updating the count matrix and computing the
evaluation measure
• Choose the split position that has the best value

24
Decision-Trees: Binary vs. vs. Multiway Splits

• Splitting (multi-way) on a nominal attribute exhausts all information in that attribute.

• Not so for binary splits on numeric attributes! Numeric attribute may be tested several
times along a path in the tree.
• Disadvantage: tree is hard to read
• Remedy: pre-discretize numeric attributes, or use multi-way splits instead of binary ones.

25
Decision-Trees: Missing values

If an attribute with a missing value needs to be tested:

• split the instance into fractional instances (pieces)

• one piece for each outgoing branch of the node
• a piece going down a branch receives a weight proportional to the popularity of the branch
• weights sum to 1

Info gain or gain ratio work with fractional instances, use sums of weights instead of counts.
During classification, split the instance in the same way. Merge probability distribution using
weights of fractional instances

26
Decision-Trees: Overfitting and Pruning

The smaller the complexity of a concept, the less danger that it overfits the data. Thus,
learning algorithms try to keep the learned concepts simple.

27
Decision-Trees: Prepruning

Based on statistical significance test. Stop growing the tree when there is no statistically
significant association between any attribute and the class at a particular node.
Most popular test: chi-squared test. Only statistically significant attributes were allowed to be
selected by information gain procedure.
Pre-pruning may stop the growth process prematurely: early stopping.

28
Decision-Trees: Post-Pruning

Learn a complete and consistent decision tree that classifies all examples in the training set
correctly .
As long as the performance increases

• Try simplification operators on the tree

• Evaluate the resulting trees
• Make the replacement the results in the best estimated performance

then return the resulting decision tree.

29
Decision-Trees: Post-Pruning

Two subtree simplification operators

• Subtree replacement
• Subtree raising

30
Decision-Trees: Subtree replacement

31
Decision-Trees: Subtree raising

32
Decision-Trees: Estimating Error Rates

Prune only if it does not increase the estimated error.

Reduced Error Pruning:

• Use hold-out set for pruning

• Essentially the same as in rule learning

33
Decision-Trees: Reduced Error Pruning

• Split training data into a growing and a pruning set

• Learn a complete and consistent decision tree that classifies all examples in the growing
set correctly
• As long as the error on the pruning set does not increase, try to replace each node by a
leaf, evaluate the resulting (sub-)tree on the pruning set then make the replacement the
results in the maximum error reduction.
• Return the resulting decision tree.

34
Decision-Trees: Decision Lists and Decision Graphs

Decision Lists

• An ordered list of rules

• The first rule that fires makes the prediction
• can be learned with a covering approach

Decision Graphs

• Similar to decision trees, but nodes may have multiple predecessors

35
Decision-Trees: Rules vs. Trees

Each decision tree can be converted into a rule set. A decision tree can be viewed as a set of
non-overlapping rules and typically learned via divide-and-conquer algorithms (recursive
partitioning) Transformation of rule sets / decision lists into trees is less trivial

• Many concepts have a shorter description as a rule set

• Low complexity decision lists are more expressive than low complexity decision trees
• Exceptions: if one or more attributes are relevant for the classification of all examples

36
Decision-Trees: Regression Problems

Regression Task: the target variable is numerical instead of discrete.

Two principal approaches

• Discretize the numerical target variable

• Adapt the classification algorithm to regression data

37
Decision-Trees: Regression Trees

Differences to Decision Trees (Classification Trees)

• Leaf Nodes: Predict the average value of all instances in this leaf
• Splitting criterion: Minimize the variance of the values in each subset
• Termination criteria: Lower bound on standard deviation in a node and lower bound on
number of examples in a node
• Pruning criterion: Numeric error measures, e.g. Mean-Squared Error

Isuzu N-Series 4JJ1 Engine Control System Lg4jjed-We - 4jj1 Euro 5 With DPD
94% (63)
Isuzu N-Series 4JJ1 Engine Control System Lg4jjed-We - 4jj1 Euro 5 With DPD
379 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
No ratings yet
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
43 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Class Basic
No ratings yet
Class Basic
75 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Ecture Ecision REE: Sajal Halder Bsmrstu
100% (1)
Ecture Ecision REE: Sajal Halder Bsmrstu
22 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Classification
No ratings yet
Classification
45 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Tree Models
No ratings yet
Tree Models
42 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
No ratings yet
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
61 pages
Assignment of Decision Tree in Machine Learning
No ratings yet
Assignment of Decision Tree in Machine Learning
15 pages
Assignment Decision Tree
No ratings yet
Assignment Decision Tree
15 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
21 Decision Trees
No ratings yet
21 Decision Trees
62 pages
Classification
No ratings yet
Classification
45 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
Decision Tree Intro MDT903
No ratings yet
Decision Tree Intro MDT903
40 pages
Machine Learning Approaches: Decision Trees
No ratings yet
Machine Learning Approaches: Decision Trees
44 pages
Trees
No ratings yet
Trees
78 pages
07.2.decision Trees - ML
No ratings yet
07.2.decision Trees - ML
32 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Republic of The Philippines Municipality of IPIL Barangay:Maasin
No ratings yet
Republic of The Philippines Municipality of IPIL Barangay:Maasin
91 pages
NAD 2200 Service Manual
No ratings yet
NAD 2200 Service Manual
12 pages
Tabela Precos
No ratings yet
Tabela Precos
60 pages
Induction Motors PPT v1
No ratings yet
Induction Motors PPT v1
10 pages
Selection of Number of Cable Cores With Emphasis On Sizing Parameters
No ratings yet
Selection of Number of Cable Cores With Emphasis On Sizing Parameters
5 pages
LRFD Compression Member Design
No ratings yet
LRFD Compression Member Design
248 pages
Curved Path of Electron A B: ST Mary's College 1
No ratings yet
Curved Path of Electron A B: ST Mary's College 1
5 pages
Preparation of Mine Fill From Process Plant Tailings: E G Thomas
No ratings yet
Preparation of Mine Fill From Process Plant Tailings: E G Thomas
11 pages
2.2 Conventional Methods of Speed Control: Unit-Ii DC Drives
No ratings yet
2.2 Conventional Methods of Speed Control: Unit-Ii DC Drives
47 pages
h1531 Gastro Market WD
No ratings yet
h1531 Gastro Market WD
58 pages
Mk7 Folding Mirror Install
No ratings yet
Mk7 Folding Mirror Install
40 pages
DS276 Low Power Transceiver Chip: Features Pin Assignment
No ratings yet
DS276 Low Power Transceiver Chip: Features Pin Assignment
11 pages
Chemical Kinetics Worksheet - 01
No ratings yet
Chemical Kinetics Worksheet - 01
2 pages
Sap Function Bapi
No ratings yet
Sap Function Bapi
4 pages
Isilon OneFS - SW - SIQ - RPO - EXCEEDED Warning SyncIQ RPO Exceeded For policyXXXXXX - Dell India
No ratings yet
Isilon OneFS - SW - SIQ - RPO - EXCEEDED Warning SyncIQ RPO Exceeded For policyXXXXXX - Dell India
4 pages
1.1-1.2 Physical Quantities and SI Units
No ratings yet
1.1-1.2 Physical Quantities and SI Units
7 pages
Ficha Técnica Ultracongelador KELVINATOR
No ratings yet
Ficha Técnica Ultracongelador KELVINATOR
2 pages
05.lecture - 05 - Steam Generator or Boilers
No ratings yet
05.lecture - 05 - Steam Generator or Boilers
26 pages
Stability and Stabilization of Hybrid Systems: Mikael Johansson
No ratings yet
Stability and Stabilization of Hybrid Systems: Mikael Johansson
51 pages
Introduction To Thermofluid Mechanics
No ratings yet
Introduction To Thermofluid Mechanics
6 pages
Calcium Ammonium Nitrate
No ratings yet
Calcium Ammonium Nitrate
88 pages
Etabs Shear Wall Design Manual Ubc 97 PDF
0% (1)
Etabs Shear Wall Design Manual Ubc 97 PDF
2 pages
Shafting System
No ratings yet
Shafting System
4 pages
Kmps Dcs Operatortraining 151029193925 Lva1 App6891 PDF
No ratings yet
Kmps Dcs Operatortraining 151029193925 Lva1 App6891 PDF
56 pages
Seamless Tubes Catalogue
No ratings yet
Seamless Tubes Catalogue
10 pages
Applied Electrical Technology
No ratings yet
Applied Electrical Technology
8 pages
Calcul Terasamente
No ratings yet
Calcul Terasamente
6 pages
Sikafloor - 359 SG
No ratings yet
Sikafloor - 359 SG
6 pages
THERMOTEX Soft Cover Cuboid (EN)
No ratings yet
THERMOTEX Soft Cover Cuboid (EN)
2 pages

Ch02 DecisionTree

Uploaded by

Ch02 DecisionTree

Uploaded by

Machine Learning

Lecturer: Duc Dung Nguyen, PhD.

Faculty of Computer Science and Engineering

A decision tree consists of

• Nodes: test for the value of a certain attribute

• Divide the problem in subproblems.

• Input: Example set S

• Select an attribute A according to some heuristic function.

• Maximum order: All examples are of the same class

→ Entropy is a measure for (un-)orderedness.

Entropy can be easily generalized for n > 2 classes:

pi is the proportion of examples in S that belong to the i-th class.

Problem: Entropy only computes the quality of a single (sub-)set of examples.

The attribute that maximizes the difference is selected.

• When node is pure, measure should be zero.

Problematic: attributes with a large number of values.

• Overfitting: selection of an attribute that is non-optimal for prediction

Intrinsic information of a split:

Average Gini index (instead of average entropy / information):

For an algorithm to be useful in a wide range of real-world applications it must:

• Permit numeric attributes

Standard method: binary splits

Efficient computation needs only one scan through the values!

• Splitting (multi-way) on a nominal attribute exhausts all information in that attribute.

If an attribute with a missing value needs to be tested:

• split the instance into fractional instances (pieces)

• Try simplification operators on the tree

then return the resulting decision tree.

Two subtree simplification operators

Prune only if it does not increase the estimated error.

• Use hold-out set for pruning

• Split training data into a growing and a pruning set

• An ordered list of rules

• Similar to decision trees, but nodes may have multiple predecessors

• Many concepts have a shorter description as a rule set

Regression Task: the target variable is numerical instead of discrete.

• Discretize the numerical target variable

Differences to Decision Trees (Classification Trees)

You might also like