Chap 4 - Using Decision Trees For Classification
Chap 4 - Using Decision Trees For Classification
A fictitious example which has been used for illustration by many authors,
notably Quinlan [2], is that of a golfer who decides whether or not to play each
day on the basis of the weather.
Figure 4.1 shows the results of two weeks (14 days) of observations of
weather conditions and the decision on whether or not to play.
Assuming the golfer is acting consistently, what are the rules that deter-
mine the decision whether or not to play each day? If tomorrow the values of
Outlook, Temperature, Humidity and Windy were sunny, 74°F, 77% and false
respectively, what would the decision be?
One way of answering this is to construct a decision tree such as the one
shown in Figure 4.2. This is a typical example of a decision tree, which will
form the topic of several chapters of this book.
In order to determine the decision (classification) for a given set of weather
conditions from the decision tree, first look at the value of Outlook. There are
three possibilities.
1. If the value of Outlook is sunny, next consider the value of Humidity. If the
value is less than or equal to 75 the decision is play. Otherwise the decision
is don’t play.
Using Decision Trees for Classification 41
4.1.2 Terminology
We will assume that the ‘standard formulation’ of the data given in Chapter 2
applies. There is a universe of objects (people, houses etc.), each of which can
be described by the values of a collection of its attributes. Attributes with a
finite (and generally fairly small) set of values, such as sunny, overcast and rain,
are called categorical. Attributes with numerical values, such as Temperature
and Humidity, are generally known as continuous. We will distinguish between
a specially-designated categorical attribute called the classification and the
other attribute values and will generally use the term ‘attributes’ to refer only
to the latter.
Descriptions of a number of objects are held in tabular form in a training
set. Each row of the figure comprises an instance, i.e. the (non-classifying)
attribute values and the classification corresponding to one object.
The aim is to develop classification rules from the data in the training set.
This is often done in the implicit form of a decision tree.
A decision tree is created by a process known as splitting on the value of
attributes (or just splitting on attributes), i.e. testing the value of an attribute
such as Outlook and then creating a branch for each of its possible values.
In the case of continuous attributes the test is normally whether the value is
‘less than or equal to’ or ‘greater than’ a given value known as the split value.
42 Principles of Data Mining
The splitting process continues until each branch can be labelled with just one
classification.
Decision trees have two different functions: data compression and prediction.
Figure 4.2 can be regarded simply as a more compact way of representing the
data in Figure 4.1. The two representations are equivalent in the sense that
for each of the 14 instances the given values of the four attributes will lead to
identical classifications.
However, the decision tree is more than an equivalent representation to the
training set. It can be used to predict the values of other instances not in the
training set, for example the one given previously where the values of the four
attributes are sunny, 74, 77 and false respectively. It is easy to see from the
decision tree that in this case the decision would be don’t play. It is important
to stress that this ‘decision’ is only a prediction, which may or may not turn
out to be correct. There is no infallible way to predict the future!
So the decision tree can be viewed as not merely equivalent to the original
training set but as a generalisation of it which can be used to predict the
classification of other instances. These are often called unseen instances and
a collection of them is generally known as a test set or an unseen test set, by
contrast with the original training set.
The training set shown in Figure 4.3 (taken from a fictitious university) shows
the results of students for five subjects coded as SoftEng, ARIN, HCI, CSA
and Project and their corresponding degree classifications, which in this sim-
plified example are either FIRST or SECOND. There are 26 instances. What
determines who is classified as FIRST or SECOND?
Figure 4.4 shows a possible decision tree corresponding to this training set.
It consists of a number of branches, each ending with a leaf node labelled with
one of the valid classifications, i.e. FIRST or SECOND. Each branch comprises
the route from the root node (i.e. the top of the tree) to a leaf node. A node
that is neither the root nor a leaf node is called an internal node.
We can think of the root node as corresponding to the original training set.
All other nodes correspond to a subset of the training set.
At the leaf nodes each instance in the subset has the same classification.
There are five leaf nodes and hence five branches.
Each branch corresponds to a classification rule. The five classification rules
can be written in full as:
IF SoftEng = A AND Project = A THEN Class = FIRST
IF SoftEng = A AND Project = B AND ARIN = A AND CSA = A
Using Decision Trees for Classification 43
if (SoftEng = A) {
if (Project = A) Class = FIRST
else {
if (ARIN = A) {
if (CSA = A) Class = FIRST
else Class = SECOND
}
else Class = SECOND
}
}
else Class = SECOND
IF all the instances in the training set belong to the same class
THEN return the value of the class
ELSE (a) Select an attribute A to split on+
(b) Sort the instances in the training set into subsets, one
for each value of attribute A
(c) Return a tree with one branch for each non-empty subset,
each branch having a descendant subtree or a class
value produced by applying the algorithm recursively
+
Never select an attribute twice in the same branch
If the first two statements (the premises) are true, then the conclusion must
be true.
This type of reasoning is entirely reliable but in practice rules that are 100%
certain (such as ‘all men are mortal’) are often not available.
A second type of reasoning is called abduction. An example of this is
Here the conclusion is consistent with the truth of the premises, but it may
not necessarily be correct. Fido may be some other type of animal that chases
cats, or perhaps not an animal at all. Reasoning of this kind is often very
successful in practice but can sometimes lead to incorrect conclusions.
A third type of reasoning is called induction. This is a process of generali-
sation based on repeated observations.
For example, if I see 1,000 dogs with four legs I might reasonably conclude
that “if x is a dog then x has 4 legs” (or more simply “all dogs have four legs”).
This is induction. The decision trees derived from the golf and degrees datasets
are of this kind. They are generalised from repeated observations (the instances
in the training sets) and we would expect them to be good enough to use for
48 Principles of Data Mining
predicting the classification of unseen instances in most cases, but they may
not be infallible.
References
[1] Michie, D. (1990). Machine executable skills from ‘silent’ brains. In Research
and development in expert systems VII. Cambridge: Cambridge University
Press.
[2] Quinlan, J. R. (1993). C4.5: programs for machine learning. San Mateo:
Morgan Kaufmann.
[3] Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1,
81–106.