0% found this document useful (0 votes)

15 views16 pages

3 - Decision Trees

Uploaded by

Brilliant Maenetja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views16 pages

3 - Decision Trees

Uploaded by

Brilliant Maenetja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

COS4852/U3/0/2024

UNIT 3 U3/0/2024

Machine Learning
COS4852

Year module

Department of Computer Science

School of Computing

CONTENTS

This document contains the material for UNIT 3 for COS4852 for 2024.

university
Define tomorrow. of south africa
A decision tree is a tree-like visual representation that work similarly to a flow-chart to make or
support decisions. Each node in the decision tree is an attribute that splits the data set into subsets
that correspond to specific values of that attribute. Each node then becomes a single decision point,
where a specific value of the attribute leads to sub-trees, until all the attributes are assigned to a
node, and final decision values are reached.

OUTLOOK

sunny overcast rain

HUMIDITY yes WIND

high normal strong weak

no yes no yes

Figure 1: Example decision tree

Figure 1 shows an example of a small decision tree that could be used to determine whether to play
sport based on the values of three weather variables: Outlook, Humidity and Wind.

1 OUTCOMES

In this Unit you will learn more about the theoretical basis of decision trees, and understand how to
apply one of the algorithms used to construct a decision tree from a dataset. You will learn how to
describe and solve a learning problem using decision tree learning.

You will:

1. Understand the relationship between Boolean function, binary decision trees and decision
lists.

2. Learn about the theoretical basis of decision trees.

3. Understand what kinds of problems can be solved using decision trees.

4. Understand how the ID3 algorithm works.

2
COS4852/U3

5. Learn how to solve classification problems using ID3.

After completion of this Unit you will be able to:

1. Translate a Boolean function into a binary decision tree.

2. Convert a Boolean function into a decision list.
3. Understand and recognise appropriate learning problems that can be solved with decision
tree learning.
4. Design a Classification System using decision trees.
5. Discuss the theoretical basis of decision trees.
6. Understand and describe how decision tree search is performed in hypothesis space, including
the inductive bias implicit in decision tree learning.
7. Understand the advantages and limitations of decision trees, including overfitting of data,
continuous-valued attributes, alternative methods for selecting attributes, missing attribute
values and attributes with different costs.
8. Discuss what kinds of problems can be solved using decision trees.
9. Solve classification problems by implementing the ID3 algorithm on given data sets.

2 INTRODUCTION

In this Unit you will investigate the theory of decision trees and learn how to describe and solve a
learning problem using decision tree learning, using the ID3 algorithm.

There are many algorithms to construct decision trees. The most famous of these is Ross Quinlan’s
ID3 algorithm that are used to construct a decision tree on a set of discrete and integer data values.
There are variants of ID3 that can operate on continuous-valued datasets, as well as variants that
use a statistical approach. There are also more complex algorithms that construct a collection of
trees, called a forest-of-trees, which, although more complex, give more options for making accurate
decision based on complex data.

3 PREPARATION

3.1 Online textbooks

Chapter 6 in Nilsson’s book works through the ID3 algorithm for decision tree construction, using a
slightly different notation from what we will be using.

3
3.2 Textbooks

Chapter 3 of Mitchell’s book goes into some depth on decision trees.

Sections 18.1 to 18.4 in Russell and Norvig’s 3rd edition is also a good source in decision lists.

3.3 Online material

Here is simple explanation of Entropy and Information Gain.

The original 1986 article by Ross Quinlan describes one of the most successful algorithms to create
decision trees.

This IBM article gives a detailed discussion on what a decision tree is and does, and how to do the
basic ID3 calculations.

The Wikipedia page on ID3 gives a good overview of the ID3 algorithm.

4 DISCUSSION

4.1 Boolean Functions and Binary Decision Trees

Boolean function:
f1 (A, B) = ¬A ∧ B

The truth table for this Boolean function is:

A B ¬A f1
0 0 1 0
0 1 1 1
1 0 0 0
1 1 0 0

Start by choosing A as the root node. This gives us the binary decision tree as in Figure 2. On the
diagram you can see the mapping between specific parts of the truth table and the binary decision
tree. Each leaf node corresponds to one row in the truth table, while the level above the leaf nodes
correspond to two rows in the truth table, etc. By merging leaf nodes with the same value the tree
can be simplified, as in Figure 3.

Using B as the start node results in a different binary decision tree. In this particular case the tree
turns out to be as simple as the first. This is not the case for all decision trees.

The binary decision tree starting with B is shown in Figure 4.

4
COS4852/U3

A B f1
? ? ?
A
0 1
A B f1 A B f1
0 ? ? B B 1 ? ?
0 ? ? 1 ? ?
0 1 0 1

0 1 0 0

A B f1 A B f1
0 0 0 A B f1 A B f1 1 1 0
0 1 1 1 0 0

Figure 2: A binary decision tree for f1 starting with A.

A
0 1

B 0
0 1

0 1

Figure 3: A simplified binary decision tree for f1 (A, B) = ¬A ∧ B starting with A.

Decision lists

Rivest wrote a paper on how to create decision lists from a Boolean function. The paper goes into
some depth in how to do this.

Nilsson’s book summarises the concept on p.22.

You can think of a decision list as a binary decision tree where each node divides the data set
into two so that one branch has a binary value ((0, 1) or {T , F } as output, and the other branch
leads leads to further subdivision of the dataset. By writing a Boolean function in a DNF form, this
becomes reasonably obvious. Another method that works well is to draw a Karnaugh diagram of the
function and reduce the function through the diagram using the same process that would be used to
create a DNF form.

5
B
0 1

0 A
0 1

1 0

Figure 4: A binary decision tree for f1 starting with B.

4.2 The ID3 algorithm

The ID3 algorithm can be described by the following pseudocode:

Require: ID3( node, instances, targets values, attributes )

Root ← node
V ← {instances}
T ← {target values}
A ← {attributes}
if all vi ∈ V = ⊕ then return Root with label ⊕
end if
if all vi ∈ V = then return Root with label
end if
if A = ∅ then return single node tree Root with label = majority value of t in A
else
A ← the attribute that best classifies instances in subset
Root = A
while vi ∈ A do
add new branch where A = vi
V(vi ) ← subset of instances of A that have value vi
if V(vi ) ∈ ∅ then
add leaf node with label = majority value of T in A
else
add subtree ID3( node, V(vi ), T, A)
end if
end while
end if
return Root

Constructing a decision tree is a recursive process to decide which attribute to use at each node of
the decision tree. We want to choose the attribute that is the “best” at classifying the instances in
the data set. “Best” here is a quantitative measure (a number). One such measure is a statistical
measure called Information Gain. This determines how well a given attribute separates the data set

6
COS4852/U3

as measured against the target classification.

ID3 uses the attribute with the highest Information Gain as the next node in constructing the tree.
The ID3 algorithm is a recursive algorithm that constructs sub-trees for attribute values of each
node, using the sub-sets of the data matching the attribute value of the sub-tree.

Attribute X with
highest information
gain

attribute attribute attribute

value X=A value X=B value X=C

Attribute Y with Attribute Z with

decision
highest information highest information
1
gain, given X=A gain, given X=C

attribute attribute attribute attribute

values values values values
X=A, Y=D X=A, Y=E X=A, Z=F X=C Y=G

decision decision decision decision

2 3 4 5

Figure 5: Decision tree showing how nodes are selected in the ID3 algorithm

Figure 5 shows a decision tree with labels indicating how ID3 selects nodes in the construction of
the tree.

To understand Information Gain we need to first look at the concept of Entropy.

Entropy

Entropy is an important concept in thermodynamics. Claude Shannon saw that the concept could
be used to describe how much information there is in the outcome of a random discrete variable
(such as determining if a coin will land heads up or not, or to make sure that communication over
a network does not lose information). We can use the concept to measure the “usefulness” of a
variable in terms of its information content. This idea forms the core of the decision tree construction
process in ID3.

Given a discrete random variable X , which takes values in the alphabet X and is distributed
according to p : X → [0, 1]:
X
H(X ) := − p(x) log p(x) = E[− log p(X )]
x∈X

7
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 6: Entropy of a single variable

where the sum is calculated over all possible alphabet values X . The base of the log matches the
distribution of p. For example log2 is used when the target values in the data is binary (yes/no or
T/F).

Figure 6 shows the Entropy for a single variable. Here you can see that Entropy is always positive
and can never be larger than 1.

Information Gain

Entropy can be viewed as a measure of the impurity of a collection of instances (a data set). In order
to construct a decision tree we want to repeatedly sub-divide our data set in such a way that we
create the largest reduction in entropy with each sub-division. The Information Gain of an attribute
A relative to a dataset S is defined as:
X |Sv |
Gain(S, A) ≡ Entropy(S) − Entropy(Sv )
|S|
v ∈Values(A)

where Values(A) is the set of all possible values attribute A can have, |Sv | is the subset of S where
A has the values v .

ID3 uses Information Gain to find the attribute that splits the dataset in such a way that we have
the highest reduction in entropy, or, as calculated above the highest Information Gain. The worked
example in Subsection 4.3 below will illustrate this in more detail.

8
COS4852/U3

4.3 Worked example

We will use the data set in Table 1 to work through an example of the ID3 algorithm.

Table 1: A set of object, their attributes and classes (positive or negative)

Colour Form Hollow Transparent Class

RED cube yes yes ⊕
BLUE sphere no yes
GREEN pyramid no yes
RED sphere no no
GREEN pyramid yes no ⊕
GREEN cube no no
BLUE cube yes no
BLUE pyramid yes yes ⊕
RED cube yes no
BLUE pyramid no no
GREEN cube no yes ⊕
RED pyramid yes no ⊕
GREEN cube yes no
GREEN sphere no yes

First, we calculate the Entropy for the entire data set. We do this as a baseline against which
to compare which attribute will become our root note. This is in turn is done by calculating the
Information Gain for each attribute.

This is a binary classification problem. There are 14 instances, of which 5 result in Class = ⊕ and 9
gives Class = . In other words:

Entropy(S) ≡ Entropy([5⊕ , 9 ])

There are four attributes, which we can shorten to (C, F , H, T ) whose combination of values
determine the value of the target attribute, Class.

Calculate the Entropy of the data set:

c
X
Entropy(S) ≡ −pi log2 (pi )
i=1
= −p⊕ log2 (p⊕ ) − p log2 (p )
= −5/14 log2 (5/14) − 9/14 log2 (9/14)
= (−0.3571 × −1.4854) + (−0.6429 × −0.6374)
= 0.9403

9
Attribute C can take on three values (shortened here):

Values(C) = R, G, B
SC = [5⊕ , 9 ]
SC=R ← [2⊕ , 2 ]
SC=G ← [2⊕ , 4 ]
SC=B ← [1⊕ , 3 ]

Calculate the Entropy values of the three subsets of the data associated with the values of the
attribute C:

Entropy(SC=R ) = −2/4 log2 (2/4) − 2/4 log2 (2/4)

= 1.0000
Entropy(SC=G ) = −2/6 log2 (2/6) − 4/6 log2 (4/6)
= 0.9183
Entropy(SC=B ) = −1/4 log2 (1/4) − 3/4 log2 (3/4)
= 0.8112

Calculate the Information Gain for attribute C:

X |Sv |
Gain(S, C) = Entropy(S) − Entropy(Sv )
|S|
v ∈{R,G,B}

= Entropy(S) − 4/14 Entropy(SC=R ) − 6/14 Entropy(SC=G ) − 4/14 Entropy(SC=B )

= 0.9403 − 4/14 × 1.0000 − 6/14 × 0.9183 − 4/14 × 0.8112
= 0.0292

Repeat these calculations for the other three attributes as well. We now get all the Information Gain
values:

Gain(S, C) = 0.0292
Gain(S, F) = 0.2000
Gain(S, H) = 0.1518
Gain(S, T) = 0.0481

The attribute with the highest Information Gain causes the highest reduction in entropy. This is the
attribute Form with Gain(S, F) = 0.2000, which then becomes the root node of the decision tree, as
shown in Figure 7.

The ID3 algorithm now recurses over the subsets of the data associated with the three branches of
the root node of the decision tree.

10
COS4852/U3

Form

cube sphere pyramid

? ?

Figure 7: Decision tree after the first set of calculations

Table 2: Subset of the data with Form=cube

Colour Form Hollow Transparent Class

RED cube yes yes ⊕
GREEN cube no no
BLUE cube yes no
RED cube yes no
GREEN cube no yes ⊕
GREEN cube yes no

In Table 2 are 6 instances, of which 2 result in Class = ⊕ and 4 gives Class = . Therefore:
Entropy(SF=c ) ≡ Entropy([2⊕ , 4 ])

Calculate the Entropy of this sub-set of the data:

c
X
Entropy(SF=c ) ≡ −pi log2 (pi )
i=1
= −p⊕ log2 (p⊕ ) − p log2 (p )
= −2/6 log2 (2/6) − 4/6 log2 (4/6)
= (−0.3333 × −1.5850) + (−0.6667 × −0.5850)
= 0.9183

Calculate the Entropy values of the three subsets of the data associated with the values of the
attribute C:
Entropy(SF=c,C=R ) = −1/2 log2 (1/2) − 1/2 log2 (1/2)
= 1.0000
Entropy(SF=c,C=G ) = −1/3 log2 (1/3) − 2/3 log2 (2/3)
= 0.9183
Entropy(SF=c,C=B ) = −0/1 log2 (0/1) − 1/1 log2 (1/1)
= 0.0000

11
Table 3: Subset of the data with Form=sphere

Colour Form Hollow Transparent Class

BLUE sphere no yes
RED sphere no no
GREEN sphere no yes

Table 4: Subset of the data with Form=pyramid

Colour Form Hollow Transparent Class

GREEN pyramid no yes
GREEN pyramid yes no ⊕
BLUE pyramid yes yes ⊕
BLUE pyramid no no
RED pyramid yes no ⊕

Calculate the Information Gain for attribute C, where Form=cube:

X |Sv |
Gain(S, CF =c ) = Entropy(S) − Entropy(Sv )
|S|
v ∈{R,G,B}

= Entropy(S) − Entropy(SC=R ) − 3/6 Entropy(SC=G ) − 1/6 Entropy(SC=B )

2/6

= 0.9183 − 2/6 × 1.0000 − 3/6 × 0.9183 − 1/6 × 0.0000

= 0.1258

We do similar calculations for the rest of the subset to get:

Gain(S, CF =c ) = 0.1258
Gain(S, HF =c ) = 0.0441
Gain(S, TF =c ) = 0.9183

The attribute with the highest Information Gain is Transparent, which then becomes the next node
in the decision tree, under the branch with the value Form=cube. The data in Table 3 show that
all the instances have output , which means that we can define a leaf node under Form=sphere.
The result of these calculations gives the decision tree as in Figure 8.

In Table 4, where Form=pyramid, are 5 instances, of which 3 result in Class = ⊕ and 2 gives
Class = . Therefore:
Entropy(SF=p ) ≡ Entropy([3⊕ , 2 ])

We do the same calculations are above to get:

Gain(S, CF =c ) = 0.9710

12
COS4852/U3

Form

cube sphere pyramid

Transparent ?

yes no

? ?

Figure 8: Decision tree after the second set of calculations and the observation for Form=sphere

and

Gain(S, CF =p ) = 0.1710
Gain(S, HF =p ) = 0.9710
Gain(S, TF =p ) = 0.9710

We now see an interesting phenomenon. The highest Information Gain values are the same for two
possible branches. We can choose either, as they have the same effect in reducing Entropy. We
have already used Transparent in another branch, so choosing Hollow will result in a simpler tree
(Occam’s razor) to get the decision tree as in Figure 9.

Form

cube sphere pyramid

Transparent Hollow

yes no yes no

? ? ? ?

Figure 9: Decision tree after the 4th set of calculations

13
We now have four branches of the tree to investigate, and possibly repeat the calculations. These
branches correspond to sub-sets of the data. Tables 5 and 6 are the subsets under the branches of
Transparent.

Table 5: Subset of the data with Form=cube and Transparent=yes

Colour Form Hollow Transparent Class

RED cube yes yes ⊕
GREEN cube no yes ⊕

Table 6: Subset of the data with Form=cube and Transparent=no

Colour Form Hollow Transparent Class

GREEN cube no no
BLUE cube yes no
RED cube yes no
GREEN cube yes no

In both of these we see that there is only one output class in each. This means that we have two
more leaf nodes, as in Figure 10.

Form

cube sphere pyramid

Transparent Hollow

yes no yes no

+ - ? ?

Figure 10: Decision tree after the 5th set of observations

We are now left with two more subsets to investigate - those for the branches of Hollow. Tables 7
and 8 show these sub-sets.

Again, we observe a similar phenomenon as with the previous two subsets, namely that there is
only a single class in each. This means that we have our final two leaf nodes, as in Figure 11.

14
COS4852/U3

Table 7: Subset of the data with Form=pyramid and Hollow=yes

Colour Form Hollow Transparent Class

GREEN pyramid yes no ⊕
BLUE pyramid yes yes ⊕
RED pyramid yes no ⊕

Table 8: Subset of the data with Form=pyramid and Hollow=no

Colour Form Hollow Transparent Class

GREEN pyramid no yes
BLUE pyramid no no

Form

cube sphere pyramid

Transparent Hollow

yes no yes no

+ - + -
Figure 11: Decision tree after the final set of observations

5 ACTIVITIES

5.1 TASK 1 - STUDY THE MATERIAL

Find and read all the online material shown earlier in this document. Study the relevant concepts
carefully and thoroughly.

5.2 TASK 2 - OTHER DECISION TREE ALGORITHMS

Find resources (some of this will be in the textbooks and material you have already studied in the
first task) on other algorithms for contructing decision trees. Some of these algorithms include ID3
(what you’ve studied here), C4.5, and CART.

15
Study these algorithms so that you understand how they work, and on what kinds of data sets
they can be applied. What are the differences? What are the advantages and shortcomings of
these algorithms. What would you do with missing or incorrect data? How would you handle
non-categorical or continuous data? Can you use other costs functions? Can you use different cost
functions in different parts of the data set? Why and when would you do so?

5.3 TASK 3

Find resources on more advanced extentions of decision-tree learning. Look specifically at ensemble
methods, such as bagging an boosting, and their further extension into random forests.

Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
AI Practical 05-Greedy Best First Search Implementation
100% (1)
AI Practical 05-Greedy Best First Search Implementation
19 pages
Decession Tree
No ratings yet
Decession Tree
72 pages
Unit-3 MLT
No ratings yet
Unit-3 MLT
74 pages
Data Mining Unit-2
No ratings yet
Data Mining Unit-2
37 pages
Classification
No ratings yet
Classification
148 pages
DM Unit Iii
No ratings yet
DM Unit Iii
87 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
42 pages
Data Structures Using C++ (Practical)
No ratings yet
Data Structures Using C++ (Practical)
3 pages
Module 3
No ratings yet
Module 3
103 pages
ML UNIT 2 Decision Tree
No ratings yet
ML UNIT 2 Decision Tree
109 pages
ML Unit 2-2-40
No ratings yet
ML Unit 2-2-40
39 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Springer - Linguistic Decision Trees For Classification-2014
No ratings yet
Springer - Linguistic Decision Trees For Classification-2014
43 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Unit 3
No ratings yet
Unit 3
46 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Module 2
No ratings yet
Module 2
42 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Ms. Mehroz Sadiq: 11/23/2020 Bahria University Islamabad 1
No ratings yet
Ms. Mehroz Sadiq: 11/23/2020 Bahria University Islamabad 1
75 pages
Decision Tree Using ID3 Algorithm
No ratings yet
Decision Tree Using ID3 Algorithm
40 pages
Laboratory Manual For
No ratings yet
Laboratory Manual For
31 pages
High Multiple Choice Questions On Queues
No ratings yet
High Multiple Choice Questions On Queues
3 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
QAP Grover S Algorithm
No ratings yet
QAP Grover S Algorithm
21 pages
ML Unit-2 Material
No ratings yet
ML Unit-2 Material
20 pages
Data Structures-Trees
100% (3)
Data Structures-Trees
40 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
Complete ID3 Decision Tree
No ratings yet
Complete ID3 Decision Tree
15 pages
Coordinate Rotation Digital Computer: Cordic Algorithm
No ratings yet
Coordinate Rotation Digital Computer: Cordic Algorithm
22 pages
Decision Trees - Id3 Algorithms
No ratings yet
Decision Trees - Id3 Algorithms
12 pages
State Space Search: Prepared By: Dr. Muhanad Tahrir Younis
No ratings yet
State Space Search: Prepared By: Dr. Muhanad Tahrir Younis
16 pages
Chapter#03 Supervised Learning and Its Algorithms - III
No ratings yet
Chapter#03 Supervised Learning and Its Algorithms - III
29 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
4.4 Greedy, Fractional Method
No ratings yet
4.4 Greedy, Fractional Method
14 pages
Machine Learning Approaches: Decision Trees
No ratings yet
Machine Learning Approaches: Decision Trees
44 pages
ML - Unit 2 - Part I
No ratings yet
ML - Unit 2 - Part I
15 pages
Graphs: Shortest Paths, Job Scheduling Problem, Huffman Code
100% (1)
Graphs: Shortest Paths, Job Scheduling Problem, Huffman Code
25 pages
Lecture3 Tolerant Retrieval
100% (1)
Lecture3 Tolerant Retrieval
48 pages
Data Mining Practical 8
No ratings yet
Data Mining Practical 8
7 pages
Advance Analysis of Algorithm: Depth First Search & Breadth First Search
No ratings yet
Advance Analysis of Algorithm: Depth First Search & Breadth First Search
46 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Storey DecisionTrees
No ratings yet
Storey DecisionTrees
38 pages
Machine Learning: MVJ21CS62
No ratings yet
Machine Learning: MVJ21CS62
12 pages
SVM Unit 2
No ratings yet
SVM Unit 2
12 pages
Slot 18 19 20 21 ContiguousStorage
No ratings yet
Slot 18 19 20 21 ContiguousStorage
51 pages
2167TC1 Lab
No ratings yet
2167TC1 Lab
8 pages
Video Tutorial: Decision Tree Learning
No ratings yet
Video Tutorial: Decision Tree Learning
21 pages
Certificate: MSG-SGKM College Arts, Science and Commerce
No ratings yet
Certificate: MSG-SGKM College Arts, Science and Commerce
34 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Algo Assignment
No ratings yet
Algo Assignment
9 pages
Module 2 Notes v1 PDF
No ratings yet
Module 2 Notes v1 PDF
20 pages
ML Introduction - CLASSIFICATION DECISION TREE
No ratings yet
ML Introduction - CLASSIFICATION DECISION TREE
18 pages
Numerical Analysis - 9th Edition - CH 6.1 - P 16E
No ratings yet
Numerical Analysis - 9th Edition - CH 6.1 - P 16E
5 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras Lecture-8 Primal Dual Algorithm
No ratings yet
Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras Lecture-8 Primal Dual Algorithm
32 pages
Unit en Decision Trees Algorithms
No ratings yet
Unit en Decision Trees Algorithms
26 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
BDM Assignment 2
No ratings yet
BDM Assignment 2
7 pages
Merge Sort: A Brief Look at One of The Earlier Sorting Algorithms
No ratings yet
Merge Sort: A Brief Look at One of The Earlier Sorting Algorithms
23 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-3
No ratings yet
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-3
3 pages
AAM CT-1 Que Bank
No ratings yet
AAM CT-1 Que Bank
2 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages
Minor Project Synopsis
No ratings yet
Minor Project Synopsis
12 pages
What Is Data Structure
100% (1)
What Is Data Structure
3 pages
Dec Tree
No ratings yet
Dec Tree
17 pages
Booths Multiplication Algorithm
No ratings yet
Booths Multiplication Algorithm
15 pages
Heuristic Search Uses A Heuristic Function To Help Guide The Search. When A Node Is Expanded
No ratings yet
Heuristic Search Uses A Heuristic Function To Help Guide The Search. When A Node Is Expanded
4 pages
DSU CT1 Question Bank 2024 25
No ratings yet
DSU CT1 Question Bank 2024 25
1 page
So sánh thuật toán cây quyết định ID3 và C45
No ratings yet
So sánh thuật toán cây quyết định ID3 và C45
7 pages
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
100% (1)
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
8 pages
Research Scholars Evaluation Based On Guides View Using Id3
No ratings yet
Research Scholars Evaluation Based On Guides View Using Id3
4 pages
Ijret - Research Scholars Evaluation Based On Guides View Using Id3
No ratings yet
Ijret - Research Scholars Evaluation Based On Guides View Using Id3
4 pages
Managerial Decision Modeling With Spreadsheets (Answer Key) P2-14-1
No ratings yet
Managerial Decision Modeling With Spreadsheets (Answer Key) P2-14-1
2 pages
Instructions/Notes:: Question # 1: 20 Marks
No ratings yet
Instructions/Notes:: Question # 1: 20 Marks
1 page
Algorithm Analysis and Design: Aad Cse Srm-Ap 1
No ratings yet
Algorithm Analysis and Design: Aad Cse Srm-Ap 1
41 pages

3 - Decision Trees

Uploaded by

3 - Decision Trees

Uploaded by

COS4852/U3/0/2024

Department of Computer Science

sunny overcast rain

HUMIDITY yes WIND

high normal strong weak

Figure 1: Example decision tree

2. Learn about the theoretical basis of decision trees.

3. Understand what kinds of problems can be solved using decision trees.

4. Understand how the ID3 algorithm works.

5. Learn how to solve classification problems using ID3.

After completion of this Unit you will be able to:

1. Translate a Boolean function into a binary decision tree.

3.1 Online textbooks

Chapter 3 of Mitchell’s book goes into some depth on decision trees.

3.3 Online material

Here is simple explanation of Entropy and Information Gain.

4.1 Boolean Functions and Binary Decision Trees

The truth table for this Boolean function is:

The binary decision tree starting with B is shown in Figure 4.

Figure 2: A binary decision tree for f1 starting with A.

Figure 3: A simplified binary decision tree for f1 (A, B) = ¬A ∧ B starting with A.

Nilsson’s book summarises the concept on p.22.

Figure 4: A binary decision tree for f1 starting with B.

4.2 The ID3 algorithm

The ID3 algorithm can be described by the following pseudocode:

Require: ID3( node, instances, targets values, attributes )

as measured against the target classification.

attribute attribute attribute

Attribute Y with Attribute Z with

attribute attribute attribute attribute

decision decision decision decision

To understand Information Gain we need to first look at the concept of Entropy.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 6: Entropy of a single variable

4.3 Worked example

Table 1: A set of object, their attributes and classes (positive or negative)

Colour Form Hollow Transparent Class

Calculate the Entropy of the data set:

Entropy(SC=R ) = −2/4 log2 (2/4) − 2/4 log2 (2/4)

Calculate the Information Gain for attribute C:

= Entropy(S) − 4/14 Entropy(SC=R ) − 6/14 Entropy(SC=G ) − 4/14 Entropy(SC=B )

cube sphere pyramid

Figure 7: Decision tree after the first set of calculations

Table 2: Subset of the data with Form=cube

Colour Form Hollow Transparent Class

Calculate the Entropy of this sub-set of the data:

Colour Form Hollow Transparent Class

Table 4: Subset of the data with Form=pyramid

Colour Form Hollow Transparent Class

Calculate the Information Gain for attribute C, where Form=cube:

= Entropy(S) − Entropy(SC=R ) − 3/6 Entropy(SC=G ) − 1/6 Entropy(SC=B )

= 0.9183 − 2/6 × 1.0000 − 3/6 × 0.9183 − 1/6 × 0.0000

We do similar calculations for the rest of the subset to get:

We do the same calculations are above to get:

cube sphere pyramid

cube sphere pyramid

Figure 9: Decision tree after the 4th set of calculations

Table 5: Subset of the data with Form=cube and Transparent=yes

Colour Form Hollow Transparent Class

Table 6: Subset of the data with Form=cube and Transparent=no

Colour Form Hollow Transparent Class

cube sphere pyramid

Figure 10: Decision tree after the 5th set of observations

Table 7: Subset of the data with Form=pyramid and Hollow=yes

Colour Form Hollow Transparent Class

Table 8: Subset of the data with Form=pyramid and Hollow=no

Colour Form Hollow Transparent Class

cube sphere pyramid

5.1 TASK 1 - STUDY THE MATERIAL

5.2 TASK 2 - OTHER DECISION TREE ALGORITHMS

You might also like