0% found this document useful (0 votes)

4 views

Machine Learning II - Decision Trees

The document provides an overview of decision trees as a machine learning algorithm, highlighting their simplicity and efficiency in knowledge extraction from data. It includes a practical example of constructing a decision tree for skiing decisions based on various attributes, and explains the concepts of entropy and information gain in the context of decision tree construction. Additionally, it introduces random forest classification as a method that utilizes multiple decision trees for improved accuracy in predictions.

Uploaded by

Efunde Joki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Machine Learning II - Decision Trees

Uploaded by

Efunde Joki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Machine Learning Lecture II:

Decision Trees

Prepared By

Dr Augustine S. Nsang
Introduction
 The decision tree learning is an extraordinarily important
algorithm for AI not only because it is very powerful, but
also because it is simple and efficient for extracting
knowledge from data.
 Compared to other learning algorithms, it has important
advantages. The extracted knowledge can be easily
understood, interpreted, and controlled by humans in the
form of a readable decision tree.
 We shall show in a simple example how a decision tree
can be constructed from training data.
A Simple Example
 A devoted skier who lives near the high sierra, a beautiful
mountain range in California, wants a decision tree to help
him decide whether it is worthwhile to drive his car to a ski
resort in the mountains. We thus have a two-class problem
ski yes/no based on the variables listed in Table 1.
 Figure 1 (Slide 5) shows a decision tree for this problem. A
decision tree is a tree whose inner nodes represent features
(attributes). Each edge stands for an attribute value. At each
leaf node a class value is given.
 The data used for the construction of the decision tree is
shown in Table 1 (next slide). Each row in the table contains
the data for one day and as such represents a sample.
A Simple Example – Cont’d

Table 1: Data set for the skiing classification problem

 Upon close examination we see that row 6 and row 7 contradict

each other. Thus no deterministic classification algorithm can
correctly classify all of the data. The number of falsely
classified data must therefore be ≥1.
 The tree in Fig. 1 (next slide) thus classifies the data optimally.
A Simple Example – Cont’d

Fig 1: Decision tree for the skiing classification problem

A Simple Example – Cont’d
 We now develop a heuristic algorithm which, starting
from the root, recursively builds a decision tree.
 First the attribute with the highest information gain
(Snow_Dist) is chosen for the root node from the set of
all attributes. For each attribute value (≤100, >100) there
is a branch in the tree.
 For every branch this process is repeated recursively.
During generation of the nodes, the attribute with the
highest information gain among the attributes which
have not yet been used is always chosen, in the spirit of a
greedy strategy.
Entropy as a Metric for Information Content
 The described top-down algorithm for the construction of a decision
tree, at each step selects the attribute with the highest information
gain.
 We now introduce the entropy as the metric for the information
content of a set of training data D. If we only look at the binary
variable skiing in the above example, then D can be described as

D = (yes, yes, yes, yes, yes, yes, no, no, no, no, no)

with estimated probabilities

p1 = P(yes) = 6/11 and p2 = P(no) = 5/11.

 Here we evidently have a probability distribution p = (6/11, 5/11).
In general, for an n class problem this reads
n
p = (p1, . . . , pn) with  pi 1
i 1
Entropy as a Metric for Information Content – Cont’d
 To introduce the information content of a distribution we observe
two extreme cases.

 First let p = (1, 0, 0, . . . , 0).

In this case, the first one of the n events will certainly occur and
all others will not. The uncertainty about the outcome of the events
is thus minimal.
In contrast, for the uniform distribution
1 1 1
, ,...,
n n n
p=( )

the uncertainty is maximal because no event can be

distinguished from the others.
Entropy as a Metric for Information Content – Cont’d

Using the definition: 0log20 = 0:

1 1 1
H(1,0,…,0) = 0 and H( n n n)
, ,..., = log2n

Thus for a 4-class problem, H has a maximum value

of 2, and for a 2-class problem, H has a maximum
value of 1.
Entropy as a Metric for Information Content – Cont’d

Fig 2: The entropy function for the case of two classes. We see the maximum at p
= 1/2 and the symmetry with respect to swapping p and 1 −p
Information Content
 The information content of a dataset is defined as:
I(D) := 1 − H(D)

 If we apply the entropy formula to the example, the result

is:
H(6/11, 5/11) = 0.994

 During construction of a decision tree, the dataset is

further subdivided by each new attribute. The more an
attribute raises the information content of the distribution
by dividing the data, the better that attribute is.
Information Gain
Information Gain – Cont’d

Applied to our example:

G(D, Snow_Dist)
4 7
H ( D )  ( H ( D100 )  H ( D100 )
11 11
4 7
0.994  ( .0  .0.863)
11 11
0.445
Information Gain – Cont’d
Analogously, we obtain:
G(D, Weekend) = 0.150 and G(D, Sun) = 0.049

Construction of the Decision Tree

 Since the attribute Snow_Dist has the largest information
gain, it becomes the root node of the decision tree. The
two attribute values ≤100 and >100 generate two edges in
the tree, which correspond to the subsets D≤100 and
D>100.
 For the subset D≤100 the classification is clearly yes. Thus
the tree terminates here. In the other branch D>100 there
is no clear result. Thus the algorithm repeats recursively.
Construction of the Decision Tree – Cont’d
 From the two attributes still available, Sun and
Weekend, the better one must be chosen. We calculate:
G(D>100, Weekend) = 0.292 and G(D>100, Sun) = 0.170
 The node thus gets the attribute Weekend assigned. For
Weekend = no the tree terminates with the decision Ski =
no. A calculation of the gain here returns the value 0.
For Weekend = yes, Sun results in a gain of 0.171.
 Then the construction of the tree terminates because no
further attributes are available, although example
number 7 is falsely classified. The finished tree was
displayed above in Slide 5.
Random Forest Classification
 Given a dataset D with n data samples, a random forest classification
is obtained using the following algorithm:

Step 1:
For i := 1 to p:
Select k data samples (k < n) at random from the n data samples in
D, and construct a decision tree using these k data samples

Step 2:
Given any unknown data sample, x, classify x using each of the p
decision trees constructed in Step 1. The class obtained by the random
forest classification is given by a majority vote of the p decision trees.

Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Ml Lecture04x2
No ratings yet
Ml Lecture04x2
16 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Module 3
No ratings yet
Module 3
101 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
3 Decision Trees_LMS
No ratings yet
3 Decision Trees_LMS
47 pages
ML-Lec5
No ratings yet
ML-Lec5
7 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
2 ML Ch3 Decision Trees Final
No ratings yet
2 ML Ch3 Decision Trees Final
70 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Data Mining Algorithms Classification L4
No ratings yet
Data Mining Algorithms Classification L4
7 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
Module 3
No ratings yet
Module 3
102 pages
Chapter4 Machine Learning Part3
No ratings yet
Chapter4 Machine Learning Part3
43 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
ML Lecture 13-14
No ratings yet
ML Lecture 13-14
33 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
7_DecisionTree
No ratings yet
7_DecisionTree
58 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
11 Descision Tree
No ratings yet
11 Descision Tree
25 pages
07_Decision tree
No ratings yet
07_Decision tree
45 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Entropy and Information Gain Explained
No ratings yet
Entropy and Information Gain Explained
10 pages
SDG Sdgs DF
No ratings yet
SDG Sdgs DF
23 pages
Naïve Bayes-DecisionTrees-RandomForest-SVM
No ratings yet
Naïve Bayes-DecisionTrees-RandomForest-SVM
26 pages
Classification
No ratings yet
Classification
30 pages
ML_Unit-2_Material
No ratings yet
ML_Unit-2_Material
20 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
jdavis-indlearn2 (1)
No ratings yet
jdavis-indlearn2 (1)
91 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
03-FSSR_DS610_2024=2025T1_DT
No ratings yet
03-FSSR_DS610_2024=2025T1_DT
51 pages
2024-Lecture11-MLAlgorithms
No ratings yet
2024-Lecture11-MLAlgorithms
84 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
Business Analytics & Machine Learning: Decision Tree Classifiers
No ratings yet
Business Analytics & Machine Learning: Decision Tree Classifiers
60 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
03 InformationGain
No ratings yet
03 InformationGain
20 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Decision Trees
No ratings yet
Decision Trees
128 pages
23 Id3
No ratings yet
23 Id3
20 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
4.Decision Tree
No ratings yet
4.Decision Tree
39 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Estimation and Detection Theory by Don H. Johnson
No ratings yet
Estimation and Detection Theory by Don H. Johnson
214 pages
SRT 605 - Topic (9) Correlation
No ratings yet
SRT 605 - Topic (9) Correlation
28 pages
A Course of Stochastic Analysis 1st Edition Alexander Melnikov pdf download
100% (1)
A Course of Stochastic Analysis 1st Edition Alexander Melnikov pdf download
47 pages
AI5030 Homework 08
No ratings yet
AI5030 Homework 08
2 pages
Stats Final Exam 2024
No ratings yet
Stats Final Exam 2024
1 page
Module 3 - Statistical Inference-1
No ratings yet
Module 3 - Statistical Inference-1
19 pages
Blavatsky y 2016
No ratings yet
Blavatsky y 2016
7 pages
Visnav Lecture Notes
No ratings yet
Visnav Lecture Notes
180 pages
Probability and Statistics: B Madhav Reddy Madhav.b@srmap - Edu.in
No ratings yet
Probability and Statistics: B Madhav Reddy Madhav.b@srmap - Edu.in
17 pages
Simulation Technique
No ratings yet
Simulation Technique
54 pages
Chapter 1 Sample Space and Probability
No ratings yet
Chapter 1 Sample Space and Probability
121 pages
Normal Approx To BD
No ratings yet
Normal Approx To BD
8 pages
Full download Stochastic Differential Equations and Applications Second Edition Xuerong Mao pdf docx
100% (1)
Full download Stochastic Differential Equations and Applications Second Edition Xuerong Mao pdf docx
72 pages
Bayesian Networks: Michal Horný Mhorny@bu - Edu
No ratings yet
Bayesian Networks: Michal Horný Mhorny@bu - Edu
17 pages
I. M. Sobol - The Monte Carlo Method (Popular Lectures in Mathematics) MIR
No ratings yet
I. M. Sobol - The Monte Carlo Method (Popular Lectures in Mathematics) MIR
72 pages
STAT613
No ratings yet
STAT613
295 pages
WST02 01 Que 20150622
No ratings yet
WST02 01 Que 20150622
24 pages
Kelompok 5B: Skor Kualitas Hidup
No ratings yet
Kelompok 5B: Skor Kualitas Hidup
3 pages
Bayesian Filtering - From Kalman Filters To Particle Filters and Beyond
No ratings yet
Bayesian Filtering - From Kalman Filters To Particle Filters and Beyond
69 pages
An Introduction To Probability and Statistics - 2015 - Rohatgi - Subject Index
No ratings yet
An Introduction To Probability and Statistics - 2015 - Rohatgi - Subject Index
11 pages
Proby
No ratings yet
Proby
16 pages
Measures of Spread
No ratings yet
Measures of Spread
5 pages
Statistics and Probability Module 1 Lesson 1.2 Part 1
No ratings yet
Statistics and Probability Module 1 Lesson 1.2 Part 1
5 pages
Section6 Conditional Probability LectureNotes 2024
No ratings yet
Section6 Conditional Probability LectureNotes 2024
6 pages
Statistic & Probability: (GRADE 11) 3 Quarter
100% (1)
Statistic & Probability: (GRADE 11) 3 Quarter
21 pages
Statistics Mean and Variance of Discrete Random Variables
No ratings yet
Statistics Mean and Variance of Discrete Random Variables
3 pages
CA-Foundation MATHS & STATS Exam Paper-8 (13!05!24)
No ratings yet
CA-Foundation MATHS & STATS Exam Paper-8 (13!05!24)
5 pages
Basic Terminology in Probability
100% (1)
Basic Terminology in Probability
4 pages
PowerPoint Slides To PCS Chapter 02 Part B
No ratings yet
PowerPoint Slides To PCS Chapter 02 Part B
22 pages
Derivatives of Risk Measures
No ratings yet
Derivatives of Risk Measures
18 pages

Machine Learning II - Decision Trees

Uploaded by

Machine Learning II - Decision Trees

Uploaded by

Machine Learning Lecture II:

Table 1: Data set for the skiing classification problem

 Upon close examination we see that row 6 and row 7 contradict

Fig 1: Decision tree for the skiing classification problem

with estimated probabilities

p1 = P(yes) = 6/11 and p2 = P(no) = 5/11.

 First let p = (1, 0, 0, . . . , 0).

the uncertainty is maximal because no event can be

Using the definition: 0log20 = 0:

Thus for a 4-class problem, H has a maximum value

 If we apply the entropy formula to the example, the result

 During construction of a decision tree, the dataset is

Applied to our example:

Construction of the Decision Tree

You might also like