0% found this document useful (0 votes)

149 views10 pages

Chap 4 - Using Decision Trees For Classification

The document discusses decision trees, which provide a way to represent classification models from a dataset in the form of a tree structure. It provides examples to illustrate how a decision tree is constructed from sample data and the rules that can be extracted from the tree to perform classification. Key points made include: 1) A decision tree splits the data into branches based on attribute values at internal nodes and assigns a class label to the leaf nodes. 2) Decision trees can compress a dataset into a more concise representation and also be used to predict the class of unseen data. 3) An example decision tree is constructed from sample weather and golf playing data to predict the classification "play" or

Uploaded by

ngoclewindows

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

149 views10 pages

Chap 4 - Using Decision Trees For Classification

Uploaded by

ngoclewindows

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

4

Using Decision Trees for Classiﬁcation

In this chapter we look at a widely-used method of constructing a model from a

dataset in the form of a decision tree or (equivalently) a set of decision rules. It is
often claimed that this representation of the data has the advantage compared
with other approaches of being meaningful and easy to interpret.

4.1 Decision Rules and Decision Trees

In many fields, large collections of examples, possibly collected for other pur-
poses, are readily available. Automatically generating classification rules (often
called decision rules) for such tasks has proved to be a realistic alternative to
the standard Expert System approach of eliciting the rules from experts. The
British academic Donald Michie [1] reported two large applications of 2,800
and 30,000+ rules, developed using automatic techniques in only one and 9
man-years, respectively, compared with the estimated 100 and 180 man-years
needed to develop the celebrated ‘conventional’ Expert Systems MYCIN and
XCON.
In many (but not all) cases decision rules can conveniently be fitted together
to form a tree structure of the kind shown in the following example.

© Springer-Verlag London Ltd., part of Springer Nature 2020 39

M. Bramer, Principles of Data Mining, Undergraduate Topics
in Computer Science, DOI 10.1007/978-1-4471-7493-6 4
40 Principles of Data Mining

4.1.1 Decision Trees: The Golf Example

A ﬁctitious example which has been used for illustration by many authors,
notably Quinlan [2], is that of a golfer who decides whether or not to play each
day on the basis of the weather.
Figure 4.1 shows the results of two weeks (14 days) of observations of
weather conditions and the decision on whether or not to play.

Outlook Temp Humidity Windy Class

(°F) (%)
sunny 75 70 true play
sunny 80 90 true don’t play Classes
sunny 85 85 false don’t play play, don’t play
sunny 72 95 false don’t play Outlook
sunny 69 70 false play sunny, overcast, rain
overcast 72 90 true play Temperature
overcast 83 78 false play numerical value
overcast 64 65 true play Humidity
overcast 81 75 false play numerical value
rain 71 80 true don’t play Windy
rain 65 70 true don’t play true, false
rain 75 80 false play
rain 68 80 false play
rain 70 96 false play

Figure 4.1 Data for the Golf Example

Assuming the golfer is acting consistently, what are the rules that deter-
mine the decision whether or not to play each day? If tomorrow the values of
Outlook, Temperature, Humidity and Windy were sunny, 74°F, 77% and false
respectively, what would the decision be?
One way of answering this is to construct a decision tree such as the one
shown in Figure 4.2. This is a typical example of a decision tree, which will
form the topic of several chapters of this book.
In order to determine the decision (classification) for a given set of weather
conditions from the decision tree, first look at the value of Outlook. There are
three possibilities.
1. If the value of Outlook is sunny, next consider the value of Humidity. If the
value is less than or equal to 75 the decision is play. Otherwise the decision
is don’t play.
Using Decision Trees for Classification 41

Figure 4.2 Decision Tree for the Golf Example

2. If the value of Outlook is overcast, the decision is play.

3. If the value of Outlook is rain, next consider the value of Windy. If the
value is true the decision is don’t play, otherwise the decision is play.
Note that the value of Temperature is never used.

4.1.2 Terminology

We will assume that the ‘standard formulation’ of the data given in Chapter 2
applies. There is a universe of objects (people, houses etc.), each of which can
be described by the values of a collection of its attributes. Attributes with a
finite (and generally fairly small) set of values, such as sunny, overcast and rain,
are called categorical. Attributes with numerical values, such as Temperature
and Humidity, are generally known as continuous. We will distinguish between
a specially-designated categorical attribute called the classification and the
other attribute values and will generally use the term ‘attributes’ to refer only
to the latter.
Descriptions of a number of objects are held in tabular form in a training
set. Each row of the figure comprises an instance, i.e. the (non-classifying)
attribute values and the classification corresponding to one object.
The aim is to develop classification rules from the data in the training set.
This is often done in the implicit form of a decision tree.
A decision tree is created by a process known as splitting on the value of
attributes (or just splitting on attributes), i.e. testing the value of an attribute
such as Outlook and then creating a branch for each of its possible values.
In the case of continuous attributes the test is normally whether the value is
‘less than or equal to’ or ‘greater than’ a given value known as the split value.
42 Principles of Data Mining

The splitting process continues until each branch can be labelled with just one
classification.
Decision trees have two different functions: data compression and prediction.
Figure 4.2 can be regarded simply as a more compact way of representing the
data in Figure 4.1. The two representations are equivalent in the sense that
for each of the 14 instances the given values of the four attributes will lead to
identical classifications.
However, the decision tree is more than an equivalent representation to the
training set. It can be used to predict the values of other instances not in the
training set, for example the one given previously where the values of the four
attributes are sunny, 74, 77 and false respectively. It is easy to see from the
decision tree that in this case the decision would be don’t play. It is important
to stress that this ‘decision’ is only a prediction, which may or may not turn
out to be correct. There is no infallible way to predict the future!
So the decision tree can be viewed as not merely equivalent to the original
training set but as a generalisation of it which can be used to predict the
classification of other instances. These are often called unseen instances and
a collection of them is generally known as a test set or an unseen test set, by
contrast with the original training set.

4.1.3 The degrees Dataset

The training set shown in Figure 4.3 (taken from a fictitious university) shows
the results of students for five subjects coded as SoftEng, ARIN, HCI, CSA
and Project and their corresponding degree classifications, which in this sim-
plified example are either FIRST or SECOND. There are 26 instances. What
determines who is classified as FIRST or SECOND?
Figure 4.4 shows a possible decision tree corresponding to this training set.
It consists of a number of branches, each ending with a leaf node labelled with
one of the valid classifications, i.e. FIRST or SECOND. Each branch comprises
the route from the root node (i.e. the top of the tree) to a leaf node. A node
that is neither the root nor a leaf node is called an internal node.
We can think of the root node as corresponding to the original training set.
All other nodes correspond to a subset of the training set.
At the leaf nodes each instance in the subset has the same classification.
There are five leaf nodes and hence five branches.
Each branch corresponds to a classification rule. The five classification rules
can be written in full as:
IF SoftEng = A AND Project = A THEN Class = FIRST
IF SoftEng = A AND Project = B AND ARIN = A AND CSA = A
Using Decision Trees for Classification 43

SoftEng ARIN HCI CSA Project Class

A B A B B SECOND
A B B B A FIRST
A A A B B SECOND
B A A B B SECOND
A A B B A FIRST
B A A B B SECOND
A B B B B SECOND
A B B B B SECOND Classes
FIRST, SECOND
A A A A A FIRST
SoftEng
B A A B B SECOND
A,B
B A A B B SECOND
ARIN
A B B A B SECOND
A,B
B B B B A SECOND
HCI
A A B A B FIRST
A,B
B B B B A SECOND
CSA
A A B B B SECOND
A,B
B B B B B SECOND Project
A A B A A FIRST A,B
B B B A A SECOND
B B A A B SECOND
B B B B A SECOND
B A B A B SECOND
A B B B A FIRST
A B A B B SECOND
B A B B B SECOND
A B B B B SECOND

Figure 4.3 The degrees Dataset

THEN Class = FIRST

IF SoftEng = A AND Project = B AND ARIN = A AND CSA = B
THEN Class = SECOND
IF SoftEng = A AND Project = B AND ARIN = B
THEN Class = SECOND
IF SoftEng = B THEN Class = SECOND
The left-hand side of each rule (known as the antecedent) comprises a num-
ber of terms joined by the logical AND operator. Each term is a simple test on
the value of a categorical attribute (e.g. SoftEng = A) or a continuous attribute
(e.g. in Figure 4.2, Humidity > 75).
44 Principles of Data Mining

Figure 4.4 Decision Tree for the degrees Dataset

A set of rules of this kind is said to be in Disjunctive Normal Form (DNF).

The individual rules are sometimes known as disjuncts.
Looking at this example in terms of data compression, the decision tree can
be written as five decision rules with a total of 14 terms, an average of 2.8
terms per rule. Each instance in the original degrees training set could also be
viewed as a rule, for example
IF SoftEng = A AND ARIN = B AND HCI = A AND CSA = B
AND Project = B THEN Class = SECOND
There are 26 such rules, one per instance, each with five terms, making a
total of 130 terms. Even for this very small training set, the reduction in the
number of terms requiring to be stored from the training set (130 terms) to the
decision tree (14 terms) is almost 90%.
The order in which we write the rules generated from a decision tree is
arbitrary, so the five rules given above could be rearranged to (say)
IF SoftEng = A AND Project = B AND ARIN = A AND CSA = B
THEN Class = SECOND
IF SoftEng = B THEN Class = SECOND
IF SoftEng = A AND Project = A THEN Class = FIRST
IF SoftEng = A AND Project = B AND ARIN = B
THEN Class = SECOND
Using Decision Trees for Classification 45

IF SoftEng = A AND Project = B AND ARIN = A AND CSA = A

THEN Class = FIRST
without any change to the predictions the ruleset will make on unseen instances.
For practical use, the rules can easily be simpliﬁed to an equivalent nested
set of IF . . . THEN . . . ELSE rules, with even more compression, e.g. (for the
original set of rules)

if (SoftEng = A) {
if (Project = A) Class = FIRST
else {
if (ARIN = A) {
if (CSA = A) Class = FIRST
else Class = SECOND
}
else Class = SECOND
}
}
else Class = SECOND

4.2 The TDIDT Algorithm

Decision trees are widely used as a means of generating classiﬁcation rules
because of the existence of a simple but very powerful algorithm called TDIDT,
which stands for Top-Down Induction of Decision Trees. This has been known
since the mid-1960s and has formed the basis for many classiﬁcation systems,
two of the best-known being ID3 [3] and C4.5 [2], as well as being used in many
commercial data mining packages.
The method produces decision rules in the implicit form of a decision tree.
Decision trees are generated by repeatedly splitting on the values of attributes.
This process is known as recursive partitioning.
In the standard formulation of the TDIDT algorithm there is a training set
of instances. Each instance corresponds to a member of a universe of objects,
which is described by the values of a set of categorical attributes. (The algo-
rithm can be adapted to deal with continuous attributes, as will be discussed
in Chapter 8.)
The basic algorithm can be given in just a few lines as shown in Figure 4.5.
At each non-leaf node an attribute is chosen for splitting. This can poten-
tially be any attribute, except that the same attribute must not be chosen twice
in the same branch. This restriction is entirely innocuous, e.g. in the branch
46 Principles of Data Mining

TDIDT: BASIC ALGORITHM

IF all the instances in the training set belong to the same class
THEN return the value of the class
ELSE (a) Select an attribute A to split on+
(b) Sort the instances in the training set into subsets, one
for each value of attribute A
(c) Return a tree with one branch for each non-empty subset,
each branch having a descendant subtree or a class
value produced by applying the algorithm recursively
+
Never select an attribute twice in the same branch

Figure 4.5 The TDIDT Algorithm

corresponding to the incomplete rule

IF SoftEng = A AND Project = B . . . . . .
it is not permitted to choose SoftEng or Project as the next attribute to split
on, but as their values are already known there would be no point in doing so.
However this harmless restriction has a very valuable effect. Each split on
the value of an attribute extends the length of the corresponding branch by one
term, but the maximum possible length for a branch is M terms where there
are M attributes. Hence the algorithm is guaranteed to terminate.
There is one important condition which must hold before the TDIDT algo-
rithm can be applied. This is the Adequacy Condition: no two instances with
the same values of all the attributes may belong to different classes. This is sim-
ply a way of ensuring that the training set is consistent. Dealing with training
sets that are not consistent is the subject of Section 9.1.
A major problem with the TDIDT algorithm, which is not apparent at first
sight, is that it is underspecified. The algorithm specifies ‘Select an attribute A
to split on’ but no method is given for doing this.
Provided the adequacy condition is satisfied the algorithm is guaranteed to
terminate and any selection of attributes (even random selection) will produce
a decision tree, provided that an attribute is never selected twice in the same
branch.
This under-specification may seem desirable, but many of the resulting de-
cision trees (and the corresponding decision rules) will be of little, if any, value
for predicting the classification of unseen instances.
Thus some methods of selecting attributes may be much more useful than
others. Making a good choice of attributes to split on at each stage is crucial to
the success of the TDIDT approach. This will be the main topic of Chapters 5
and 6.
Using Decision Trees for Classification 47

4.3 Types of Reasoning

The automatic generation of decision rules from examples is known as rule
induction or automatic rule induction.
Generating decision rules in the implicit form of a decision tree is also often
called rule induction, but the terms tree induction or decision tree induction
are sometimes preferred. We will end this chapter with a digression to explain
the signiﬁcance of the word ‘induction’ in these phrases and will return to the
topic of attribute selection in Chapter 5.
Logicians distinguish between diﬀerent types of reasoning. The most famil-
iar is deduction, where the conclusion is shown to follow necessarily from the
truth of the premises, for example

All Men Are Mortal

John is a Man
Therefore John is Mortal

If the ﬁrst two statements (the premises) are true, then the conclusion must
be true.
This type of reasoning is entirely reliable but in practice rules that are 100%
certain (such as ‘all men are mortal’) are often not available.
A second type of reasoning is called abduction. An example of this is

All Dogs Chase Cats

Fido Chases Cats
Therefore Fido is a Dog

Here the conclusion is consistent with the truth of the premises, but it may
not necessarily be correct. Fido may be some other type of animal that chases
cats, or perhaps not an animal at all. Reasoning of this kind is often very
successful in practice but can sometimes lead to incorrect conclusions.
A third type of reasoning is called induction. This is a process of generali-
sation based on repeated observations.

After many observations of x and y occurring together, learn the rule

if x then y

For example, if I see 1,000 dogs with four legs I might reasonably conclude
that “if x is a dog then x has 4 legs” (or more simply “all dogs have four legs”).
This is induction. The decision trees derived from the golf and degrees datasets
are of this kind. They are generalised from repeated observations (the instances
in the training sets) and we would expect them to be good enough to use for
48 Principles of Data Mining

predicting the classiﬁcation of unseen instances in most cases, but they may
not be infallible.

4.4 Chapter Summary

This chapter introduces the TDIDT (Top-Down Induction of Decision Trees)
algorithm for inducing classiﬁcation rules via the intermediate representation
of a decision tree. The algorithm can always be applied provided the ‘adequacy
condition’ holds for the instances in the training set. The chapter ends by
distinguishing three types of reasoning: deduction, abduction and induction.

4.5 Self-assessment Exercises for Chapter 4

1. What is the adequacy condition on the instances in a training set?
2. What are the most likely reasons for the condition not to be met for a given
dataset?
3. What is the signiﬁcance of the adequacy condition to automatic rule gen-
eration using the TDIDT algorithm?
4. What happens if the basic TDIDT algorithm is applied to a dataset for
which the adequacy condition does not apply?

References
[1] Michie, D. (1990). Machine executable skills from ‘silent’ brains. In Research
and development in expert systems VII. Cambridge: Cambridge University
Press.
[2] Quinlan, J. R. (1993). C4.5: programs for machine learning. San Mateo:
Morgan Kaufmann.
[3] Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1,
81–106.

Decision Tree
No ratings yet
Decision Tree
66 pages
Classification: Decision Tree Induction: Lecture #9
No ratings yet
Classification: Decision Tree Induction: Lecture #9
121 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Customer Journey Map Playbook
100% (11)
Customer Journey Map Playbook
36 pages
Evaluation of Sleep Habits and Class Participation Research
No ratings yet
Evaluation of Sleep Habits and Class Participation Research
27 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Chapter 1 - Somatic Awareness in The Workplace-7
No ratings yet
Chapter 1 - Somatic Awareness in The Workplace-7
6 pages
Process Planning and Cost Estimation
100% (1)
Process Planning and Cost Estimation
13 pages
09 Decision Tree Induction
No ratings yet
09 Decision Tree Induction
120 pages
What Did We Learn?: Learning Problem
No ratings yet
What Did We Learn?: Learning Problem
60 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
DS4 - CLS-Decision Tree
No ratings yet
DS4 - CLS-Decision Tree
32 pages
Concepts - Decision Trees
No ratings yet
Concepts - Decision Trees
23 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Lecture 023+-+Decision+Trees+ - 1
No ratings yet
Lecture 023+-+Decision+Trees+ - 1
54 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
Decision Tree Decision Tree: R. Akerkar
No ratings yet
Decision Tree Decision Tree: R. Akerkar
30 pages
Decisiontrees
No ratings yet
Decisiontrees
28 pages
Decision Tree Introduction
No ratings yet
Decision Tree Introduction
14 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
80 pages
Unit 4
No ratings yet
Unit 4
78 pages
Classification
No ratings yet
Classification
30 pages
Lecture 19 - Decision Tress
No ratings yet
Lecture 19 - Decision Tress
21 pages
Lec.7.intro.D.S. Fall 2023
No ratings yet
Lec.7.intro.D.S. Fall 2023
26 pages
03 Decision Tree
No ratings yet
03 Decision Tree
59 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Trees 4
No ratings yet
Decision Trees 4
56 pages
Tutorial 1
No ratings yet
Tutorial 1
4 pages
AIML Lect5 Decision Tree
No ratings yet
AIML Lect5 Decision Tree
33 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Robotics SLAM For Dummies - Riisgaard and Blas (MIT OCW)
No ratings yet
Robotics SLAM For Dummies - Riisgaard and Blas (MIT OCW)
127 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Trees
No ratings yet
Decision Trees
34 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
AI Lecture 9
No ratings yet
AI Lecture 9
69 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
15.module6 Decisiontree-Updated 14
No ratings yet
15.module6 Decisiontree-Updated 14
20 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
DM Unit Iii
No ratings yet
DM Unit Iii
87 pages
Drow Elves R.C.C. Information
100% (1)
Drow Elves R.C.C. Information
25 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
4 Classification
No ratings yet
4 Classification
20 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
Classification&Decision Tree
No ratings yet
Classification&Decision Tree
10 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
05 Classification
No ratings yet
05 Classification
33 pages
Mid Term IX Datesheet & Syllabus 2024-25
No ratings yet
Mid Term IX Datesheet & Syllabus 2024-25
3 pages
Picturesque Quebec: A Sequel To Quebec Past and Present by Le Moine, J. M. (James MacPherson), Sir, 1825-1912
No ratings yet
Picturesque Quebec: A Sequel To Quebec Past and Present by Le Moine, J. M. (James MacPherson), Sir, 1825-1912
459 pages
Aud Cis Reference
No ratings yet
Aud Cis Reference
49 pages
Thesis Quiz Questions
100% (3)
Thesis Quiz Questions
8 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
Persons and Family Relation Case Digests
No ratings yet
Persons and Family Relation Case Digests
384 pages
Rohan Kumar1
No ratings yet
Rohan Kumar1
70 pages
Process Approach
No ratings yet
Process Approach
11 pages
3is Manuscript
No ratings yet
3is Manuscript
74 pages
Virtual Work Concept
No ratings yet
Virtual Work Concept
35 pages
Sea Ice Classification
No ratings yet
Sea Ice Classification
10 pages
Calculate Your Hourly Rate - by Hoodzpah For Adobe Create
No ratings yet
Calculate Your Hourly Rate - by Hoodzpah For Adobe Create
4 pages
Biography of Tao Chien
No ratings yet
Biography of Tao Chien
3 pages
Indicative Bib Dark Tourism
No ratings yet
Indicative Bib Dark Tourism
7 pages
Alexander 2004
No ratings yet
Alexander 2004
158 pages
MarCom Pos
No ratings yet
MarCom Pos
62 pages
Hpta Narrative Report Q3
No ratings yet
Hpta Narrative Report Q3
3 pages
Masque of The Red Death 1
No ratings yet
Masque of The Red Death 1
4 pages
New Perspectives New Truths
No ratings yet
New Perspectives New Truths
50 pages
Circular 00.38.2025 Prescribed Material JC English Assessment in 2028 2029 2030 Update
No ratings yet
Circular 00.38.2025 Prescribed Material JC English Assessment in 2028 2029 2030 Update
4 pages
Guia Curso Ventas
No ratings yet
Guia Curso Ventas
4 pages
Everything About The Marxism
No ratings yet
Everything About The Marxism
23 pages
Recreational Acivities
No ratings yet
Recreational Acivities
17 pages
NOtes in Filipino Philosophy
No ratings yet
NOtes in Filipino Philosophy
14 pages
Family Business - Succession Planning For Family Business
No ratings yet
Family Business - Succession Planning For Family Business
5 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Reading for the Contemporary Guitarist Volume 3
From Everand
Reading for the Contemporary Guitarist Volume 3
Ian Robbins
No ratings yet

Chap 4 - Using Decision Trees For Classification

Uploaded by

Chap 4 - Using Decision Trees For Classification

Uploaded by

4

Using Decision Trees for Classiﬁcation

In this chapter we look at a widely-used method of constructing a model from a

4.1 Decision Rules and Decision Trees

© Springer-Verlag London Ltd., part of Springer Nature 2020 39

4.1.1 Decision Trees: The Golf Example

Outlook Temp Humidity Windy Class

Figure 4.1 Data for the Golf Example

Figure 4.2 Decision Tree for the Golf Example

2. If the value of Outlook is overcast, the decision is play.

4.1.3 The degrees Dataset

SoftEng ARIN HCI CSA Project Class

Figure 4.3 The degrees Dataset

THEN Class = FIRST

Figure 4.4 Decision Tree for the degrees Dataset

A set of rules of this kind is said to be in Disjunctive Normal Form (DNF).

IF SoftEng = A AND Project = B AND ARIN = A AND CSA = A

4.2 The TDIDT Algorithm

TDIDT: BASIC ALGORITHM

Figure 4.5 The TDIDT Algorithm

corresponding to the incomplete rule

4.3 Types of Reasoning

All Men Are Mortal

All Dogs Chase Cats

After many observations of x and y occurring together, learn the rule

4.4 Chapter Summary

4.5 Self-assessment Exercises for Chapter 4

You might also like