0% found this document useful (0 votes)

16 views8 pages

Decision Trees

Decision trees trace their origins back to ancient times but emerged computationally in the late 20th century alongside fields like artificial intelligence and machine learning. They work by recursively splitting a dataset based on the values of predictor variables to partition the data into leaves with increasingly similar target values. This allows for an intuitive tree structure display that enhances understanding. Recent developments include techniques like boosting and random forests that utilize multiple decision trees.

Uploaded by

karinduarte30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views8 pages

Decision Trees

Uploaded by

karinduarte30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Overview

Decision trees
Barry de Ville

Decision trees trace their origins to the era of the early development of
written records. This history illustrates a major strength of trees: exceptionally
interpretable results which have an intuitive tree-like display which, in turn,
enhances understanding and the dissemination of results. The computational
origins of decision trees—sometimes called classification trees or regression
trees—are models of biological and cognitive processes. This common heritage
drives complementary developments of both statistical decision trees and trees
designed for machine learning. The unfolding and progressive elucidation of the
various features of trees throughout their early history in the late 20th century is
discussed along with the important associated reference points and responsible
authors. Statistical approaches, such as a hypothesis testing and various resampling
approaches, have coevolved along with machine learning implementations. This
had resulted in exceptionally adaptable decision tree tools, appropriate for various
statistical and machine learning tasks, across various levels of measurement, with
varying levels of data quality. Trees are robust in the presence of missing data
and offer multiple ways of incorporating missing data in the resulting models.
Although trees are powerful, they are also flexible and easy to use methods. This
assures the production of high quality results that require few assumptions to
deploy. The treatment ends with a discussion of the most current developments
which continue to rely on the synergies and cross-fertilization between statistical
and machine learning communities. Current developments with the emergence
of multiple trees and the various resampling approaches that are employed are
discussed. © 2013 Wiley Periodicals, Inc.

How to cite this article:

WIREs Comput Stat 2013, 5:448–455. doi: 10.1002/wics.1278

Keywords: decision trees; rule induction; predictive models; machine learning;

boosting; random forests

INTRODUCTION artificial intelligence, machine learning, knowledge

discovery, and inductive rule-builders that are used in
D ecision trees are general purpose prediction
and classification mechanisms that were among
the first statistical algorithms to be implemented in
a range of data mining, knowledge discovery, machine
learning, and artificial intelligence tasks.
electronic form during the adoption of digital circuitry The main characteristic of decision trees is
to electronic computations in the later decades of the a recursive subsetting of a target field of data
20th century. They have evolved to become highly according to the values of associated input fields
cross-disciplinary, general purpose computationally or predictors to create partitions, and associated
intensive methods for prediction and classification, descendent data subsets (called leaves or nodes), that
contain progressively similar intra-leaf (or intra-node)
∗ Correspondence to: [email protected] target values and progressively dissimilar inter-leaf (or
SAS Institute Inc., Cary, NC, USA inter-node values) at any given level of the tree.
The ‘Porphyrian tree’, a form of decision
Conflict of interest: The authors have declared no conflicts of tree, is the oldest known type of classification tree
interest for this article.
diagram and was conceived by the Greek philosopher

448 © 2013 Wiley Periodicals, Inc. Volume 5, November/December 2013

WIREs Computational Statistics Decision trees

Node ID: 1
1: 38.2 %
0: 61.8 %
Count: 1309
−

Gender

Female Male

Node ID: 23 Node ID: 24

1: 72.7 % 1: 19.1 %
0: 27.3 % 0: 80.9 %
Count: 466 Count: 843
− −

Age Age

<30.75 or missing >=30.75 <9.5 >=9.5 or missing

Node ID: 30 Node ID: 31 Node ID: 33 Node ID: 34

1: 67.8 % 1: 82.9 % 1: 58.1 % 1: 17 %
0: 32.2 % 0: 17.1 % 0: 41.9 % 0: 83 %
Count: 314 Count: 152 Count: 43 Count: 800

FIGURE 1 | A decision tree illustrating analysis of survival in Titanic sinking

Porphyry in the 3rd century C.E.1 These early observations. This top-most root node contains the
precomputational origins of decision trees confirm global distribution of the ‘target’ field for the analysis:
a persistently useful, innate capability of decision in this case, survival versus nonsurvival. In general,
trees to project and encapsulate contextually revealing targets may be any level of measurement; e.g. nominal,
visual displays that are both intuitive and powerful ordinal, or interval. When nominal targets are used,
visual metaphors. If we fast forward to the 20th as in the case shown in Figure 1, the tree is sometimes
century, we see that computational decision trees referred to as a ‘classification tree’.
emerged at the same time as the nascent fields of In Figure 1, the overall survival rate—repesented
artificial intelligence2,a and statistical computation. by ‘1’ in the data—is 38%. Marginal counts are
As a result their development has benefitted from a sometimes presented alongside the percentages so as
rich cross-disciplinary cross-fertilization that has led to display the actual number of observations that fall
to a range of new methods—from resampling methods into the two respective categories. In Figure 1, only
like boosting and bagging—to more recent generalized the total number of observations are displayed at the
multiple tree methods such as Random Forests. bottom of the node display (labeled as Node ID: 1).
The decision tree unfolds in a stepwise fashion:
the tree is formed by first partitioning the root
OPERATION, FEATURES, AND node to form branches that define the descendent
INTERPRETATION leaves (or nodes) that form clusters of observations
The characteristic form of decision trees is shown in that are alike within a node yet dissimilar when
Figure 1. Here we see a recursive subsetting of a target compared to other nodes at any given level of the
field of data according to the values of associated tree. The branch partitions are based on a selection
fields to create partitions, and associated descendent that is taken from a search through the data set
data subsets (nodes), that contain progressively similar to discover fields of data that can be input as
intra-node target values and progressively dissimilar partitioning fields to best describe the variablility
inter-node values at any given level of the tree. among the target values that are displayed in the root
Figure 1 shows a decision tree analysis perfomed node. Potential partitioning fields are thereby termed
on data that are drawn from research conducted on ‘inputs’. Once an input is selected, the descendent
passengers on the ill-fated Titanic.2 The top-most node leaves, or nodes, are produced. (terminal nodes are
of the tree—termed the ‘root node’—contains 1309 usually called ‘‘leaves’’). In Figure 1, the first level

Volume 5, November/December 2013 © 2013 Wiley Periodicals, Inc. 449

Overview wires.wiley.com/compstats

of the decision tree is produced by selecting the the best overall model. Regardless of the method, once
‘Gender’ field as the best input field from the set of the initial level of the tree is determined the process
inputs that are available (other inputs in this data set continues in a recursive fashion until one of more
include passenger age, cabin class (first, second, and possible stop conditions are met, thus terminating the
so on), fare paid, cabin location, boarding location, process. Generally, stopping rules consist of thresholds
and destination). on diminishing returns (in terms of test statistics) or
The selection of the ‘best’ input field is an in a diminishing supply of training cases (minimum
open subject of active research. Decision trees acceptable number of observations in a node).
allow for a variety of computational approaches to As shown in Figure 1, gender is selected as the
input selection. The top-down graphical display also first partitioning field below the root node. In this
supports the exploration of various effects visually, so case, we see that the use of gender as the partitioning
that strong branches—or compelling branches—may field forms two descendent nodes for female and male
be selected based on theoretical notions about the passengers, respectively. One interpretation might be
interaction of the various model components. In to note that the effect of gender is strong and appears
this example, the ‘best’ field selection is based to follow a protocol that calls for ‘women and children
on partition strength diagnostics produced by the first’ in the lifeboats. Here we see that, while the over-
software, coupled with the domain knowledge of all survival rate is 38%, this increases to about 73%
the analyst. In the Titanic data, gender, age, and among females whereas the overall male survival rate
cabin class are all important and predictive inputs drops to about 19%. The descendent nodes formed
with multiple, interweaving interactions. A complete by recursively partitioning the female and male nodes,
exposition of the various interactions is not possible respectively, illustrate one of the most striking and
in the limited space here. Consequently, gender alone useful features of decision trees: here we see the con-
is used in the description here so as to present a textual effect of age on survival rate. In this case, we see
simple, hopefully compelling, result. This result, and that among females, older ages are more likely to sur-
the domain knowledge framework that describes it, is vive (83% survival rate among older females vs 68%
presented below. survival rate among younger females). In the male
The descendent nodes produced by the selection population, the effect is completely reversed: older
of gender as the first partitioning field in Figure 1 males have a substantially lower survival rate (17%
are commonly referred to as the first level of the vs 58% is older males compared with younger males).
tree. The leaves in this first level correspond to the We can interpret these findings as normative
male and female passengers. The ‘leaf’ terminology behavior in the social dynamics that evolved in
is often used when the decision ‘tree’ metaphor for this impromptu community that consists of the self-
this method is used. The more general term ‘node’ selected passengers of this inaugural voyage across the
is used in recognition of the fact that decision trees Atlantic. Our initial sense of the ‘women and children
are a particular form of connected graph. In graph first’ protocol—displayed in the first partition—is
terminology, the partitions are ‘edges’ and the leaves reinforced by normative behavior that demonstrates
are ‘nodes’. preferential treatment based on age status. Because the
Using the ‘node’ terminology, the first level of the second tier partitions are unique to female and male
tree has two descendent nodes: the ‘female’ descendent groups, respectively, we see a contrasting preferential
has a survival rate of about 72%, whereas the ‘male’ age treatment among females compared with males.
descendent node has a survival rate of only 19%. This contrast favors older females and younger males.
It is normal, as in this case, to select the the input This asymmetry in the descendent nodes on the second
that produces the most dramatic separation in the level of the tree provides a dramatic illustration of
variability among the descendent nodes. In practice, the outstanding ability of decision trees to expose
the analyst may often guide the sequence of the relationships in context.
unfolding of branch partitions in order to support The enduring legacy of decision trees is that
a better explanation of a sequence of effects or to they demonstrate that multiple contributors need
support and confirm the conditional relations that to be recruited to effectively explain a relationship.
are assumed to exist among the various inputs and the Further, the form of the resulting relationships will
component nodes that they produce. In the case of high reveal multiple contextual effects that will influence
performance predictive modeling applications there is the understanding and effective presentation of the
less emphasis on analyst interaction in the formation results. The utility of decision trees in detecting and
of the tree and more emphasis on the selection of presenting contextual effects was a significant driver
high quality partitions that can collectively produce to the development of one of the earliest and most

WIREs Computational Statistics Decision trees

influencial computer implementations of decision • The automatic treatment of various levels of

trees: the Automatic Interaction Detection (AID) measurement and the associated visual display
program developed at the University of Michigan.3 contribute to model flexibility, ease of use and
The bottom level of the tree presented in Figure ease of presentation and interpretation. While
1 shows some other useful and important features of not shown in the figure, the approach can be
decision trees: we see that the partitions that form the calibrated to either automatically include or
branches of the respective female and male descendent exclude missing values. This represents a further
nodes employ different cut-points: among females the contribution to flexibility and ease of use.
cut-point for the age partition lies at 30.75, whereas • There is a potential unfolding of asymmetric trees
among males the cut-point lies at 9.5. Here we see with different subpartitions in descendent nodes.
that the decision tree methodology employs similarity • Descendent nodes result in the identification
search algorithms to find the most discriminating cut- of local effects that are conditional on the
points among the branches of potential partitions. fields that form the partition sequence that are
This means that members of a given partition are used to identify the effect. These local effects
as much alike as possible. This determination is are conditional on the interactions among the
often made on the basis of a statistical test of partitioning fields and are sometimes referred to
differences between values. Although binary partitions as ‘interaction effects’.
are shown here for differences in the age dimension, • Partitioning fields may be at nominal, ordinal, or
the technique allows for the use of multiway (k-way) interval measurement levels. In the case of ordinal
partitions. The multiway partitions were developed or interval measures, the partitioned values are
as an enhancement to the method of AID.4 Whether grouped together so as to maximally discriminate
binary or k-way partitions are used depends much among high and low percentages in the resulting
upon analyst preference; in any case, methods are target field proportions in the descendent nodes.
usually applied to define cut-points which attempt to
• While not shown in the illustration, we note that
identify differences among nodes that are statistically
branch partitions may be two-way or multiway
significant and which generalize well to novel data.
branches.
We can also see that the labels for the age
partitions in Figure 1 indicate that missing values
are being used in the determination of partition cut- OTHER FEATURES
points. Decision trees can explicitly include missing
values as valid values in the determination of branch As with all quantitative models, the form of the tree
partitions. When included in the analysis, missing has to be limited by considerations of reproducibility
values for a given input are often allowed to group and generalization. This has led to the development
of various stopping criteria to limit the growth of the
with other values that they most closely resemble in
tree on one hand and various test and validation
terms of relation to the target. Missing values may
approaches that increase the likely accuracy and
also be included as a separate value and so may form
reliability of tree models in post-training applications.
a distinct partition on their own. Of course, missing
A concise summary is presented in an early paper
values can also be excluded from the analysis.
by Kass.4 The flexibility of the single decision tree
This brief illustration displays some of the
approach described here has proven to be well
distinctive features of the decision tree method:
adapted to multitree models, based on a variety of
resampling and multisample methods, a development
• The form of the tree results from a stepwise and that has radically improved the practicality and overall
recursive partitioning of a target field according observed performance of decision trees in a wide
to significant discriminating features of one or variety of model settings.
more associated input fields.
• All levels of measurement—nominal, ordinal, ORIGINS
and interval—are automatically accommodated
All modern versions of decision trees trace their
in either target or input position in the tree
origins to work carried out by Belson in the 1950s,
formation. especially his work in the analysis of nation-wide
• Successive partitioning results in the presentation audience surveys on behalf of the British Broadcasting
of a tree-like visual display with a top node and Corporation.5 This work was originally undertaken
descendent branches. prior to the introduction of digital computers and

Overview wires.wiley.com/compstats

exploited the then most current technology of decision trees were better tools because they find the
mechanical calculators. interactions as they grow the tree.
Decision trees turn out to be well adapted to Many observers at the time were resistant
mechanical calculators using Hollerith punch cards to employ the relatively new and lightly tested
because of the sorting and selection characteristics approach advocated by Morgan and Sonquist.
of the algorithm and the avoidance of, e.g., any Regression practitioners then—and now—develop
matrix-based computations. For each predictive field results on the basis of well-informed theory and widely
that could be considered as an input for use in tested results in a broad, active, and well-informed
the characterization of a target field it was possible community. The theoretical underpinnings—coupled
to sort subclasses formed for each target-predictor with a rich history of fielded results—enable regression
combination and then to identify imbalances between practitioners to develop time-tested, effective metrics
the expected frequency of the subclass and the and diagnostics in a wide range of circumstances.
observed frequency of the subclass. This step-by-step Decision trees were demonstrated to have
recursive process is simple enough for both mechanical shortcomings of their own: how to go about selecting
calculators and unassisted humans. Unbalanced appropriate variables to form the tree partitions (input
distributions—which we would now identify as vetting and selection) and how many partitions, of
distributions with high chi-squared values—could be what complexity, to build. These latter two problems
easily identified with the tabulating machines available served as the ‘grist for the mill’ of the next steps in the
at this time. This method—so useful in the era prior to development of statistical decision trees carried out by
digital computers—survives to the current day as the Kass and Hawkins8 and Breiman et al.,9 respectively.
underpinning for all decision tree implementations. Over time, this body of work has provided substantial
A further refinement introduced by Belson credibility and a rich legacy of fielded applications
involved the differential assessment of nested subclass that help establish trees as a useful, viable, and
predictors.6 Belson recognized that descendent nodes trustworthy technique.
of a tree could be examined recursively, just as
the top node had been. Belson further recognized
that descendent nodes could be subset by either the RULE INDUCTION, MACHINE
same predictor or another predictor such that the
descendent nodes of the tree could be balanced and
LEARNING, AND DECISION TREES
symmetrical—employing a matching set of predictors During the 1950s, as Belson was developing
with each level of the subtree—or could be unbalanced his approach, a kind of computation which
in that subnode partitions could be based on the most he described as based ‘ . . . on the principal
powerful predictor at a given level of the subtree. of biological classification’, other researchers in
This innovation exploits the power of decision trees experimental psychology were attempting to encode
to explore and discover a host of subregion effects human approaches to concept formation tasks. Both
in data and, like the use of predictors identified on approaches naturally fed into the nascent field of
the basis of deviation from expected values, forms the artificial intelligence and machine learning. In this
basis of modern decision trees. way Belson’s work serves as a precursor to a new line
Morgan and Sonquist7 built on Belson’s early of decision tree development that employs machine
work and saw decision trees as a complement and algorithms to produce executable rules.
alternative to regression to analyze survey data. The work in experimental psychology led to the
Initially, Morgan and Sonquist began with the notion development of a computer implementation, entitled
of employing trees in order to identify interaction ‘CLS’ (for Concept Learning System) developed by
terms that would be useful in forming the most Hunt et al.10 As in the earlier approaches of Belson
effective regression solution for their data modeling and Morgan and Sonquist, CLS works through
tasks. In tests run by Morgan and Sonquist, they the successive application of partitions in the data
observed a decision tree which partitioned data into 21 based on highly discriminating variables or inputs.
groups that accounted for two-thirds of variance of the J. Ross Quinlan entered this field from a machine
response variable. A similar regression with 30 terms, learning perspective. He formalized the development
including interaction terms, was only able to account of this approach to concept formation as a method
for 36% of the variance in the response. The authors of knowledge acquisition. This resulted in the
reached three conclusions: (1) that interactions among development of ‘Interactive Dichotomizer 3’ (ID3).11
inputs are inevitable; (2) that regression requires the Follow-ons to Quinlan’s initial work have led
analyst to specify interactions in advance; and (3) that to the development of a number of rule generation

WIREs Computational Statistics Decision trees

approaches for knowledge acquisition, commonly of selecting the best single predictor at any one stage
referred to as ‘rule induction’. in the growth of the decision tree—can be extended by
resampling the available training data. This random
BOX 1 element has many benefits: the most obvious benefit
is the smoothing properties. While a single decision
Donald Michie served as the editor of a set tree bisects the space of training data into a number
of findings that featured Quinlan’s initial work of hard-edge rectangles, multitrees form many over-
on ID3. Michie was a colleague of Alan Turing lapping bisections so that the fitted space more closely
during the World War II Enigma Project and approximates such methods as neural networks
is a founding father of the field of artificial and multiple regression. With multiple trees we can
intelligence. He later employed inductive rules derive multiple, overlapping viewpoints that are
to the adaptive control of robotic devices different but complementary. When taken together,
and spacecraft.12 This rule method serves as a the overlapping views reduce both variance and bias.
template for self-learning robotic systems up to The resampling approach has led to a number
the present day.
of methods to ‘boost’ the predictive power of the
host training set. Multiple trees are always grown,
Subsequent work by Quinlan led to the regardless of the specific method that is employed. In
development of C4.5.13 addition to the introduction of random components
Rule induction is an active area of development in multiple trees, these approaches also offer the
and has led to a range of rule induction approaches, opportunity to reweight computations in successive
for example, W Cohen’s ‘RIPPER’.14 RIPPER iterations of tree growth. Unlike the ‘sequential
incorporates a multitree approach often described covering’ approach, described above, where successive
as ‘sequential covering’. In these approaches the samples are drawn from the training corpus in
tree is first grown so that a pure node is found. unaltered form, boosting approaches reweight cases
A pure node is a node that results from the in successive iterations. The coverage offered by these
identification of a rule that predicts 100% of the approaches is less structured and deterministic than
target values. The preconditions of the rule ‘covers’ sequential covering. In this approach, the reweighting
the training observations that correspond to this rule. goal is to alter successive training samples with
The observations that are covered by the rule are the view to improving the predictive performance
then removed from the training data (i.e. are ‘ripped’ of successive rule sets. These approaches have been
out). Successive trees are run, at each step looking explored and advocated by Schapire;17 notably in
for a rule that produces a ‘pure node’. Multiple trees Adaboost developed by Freund and Schapire;18 Arcing
may be grown until no more pure nodes are found. by Breiman;19 and Gradient Boosting by Friedman.20
Overall, the predictive space is ‘covered’ through The Adaboost method (from ‘adaptive boosting’)
the layering of these successively grown predictive employs an approach that reweights individual
rules. The RIPPER algorithm is a greedy algorithm; observations in subsequent samples. In Gradient
i.e. it produces excessively overoptimistic results Boosting, the target value is adjusted by a function of
that do not generalize well. Alternative multitree the residual of the training value minus the predicted
approaches, discussed below, are less greedy and value.
offer superior generalization performance. Another Various group-voting or aggregation methods
innovation suggested by Cohen was to form rules are possible in the production of a final group-
based on both the presence and absence of attributes voting metric: including numeric averaging with
(allow Boolean NOTs to form part of the selection continuous outcomes and majority votes or polling
expression). This approach has more recently been with categorical outcomes.
implemented as part of a text mining solution to The interaction between the fields of statistical
generate automatic text classification rules based on decision trees and machine learning continued
inductive rule learning.15 throughout these adaptations of bootstrapping
applications to multiple trees. One innovation
CURRENT DEVELOPMENTS included sampling and randomization across both
rows and columns of the training data. This technique
(MULTIPLE TREES) entered the machine learning field in the application of
The bootstrap method, described by Efron,16 is a multiple decision trees to digit recognition as described
prominent example of the utility of resampling in sta- in Amit and Geman.21 Much of this cross-fertilization
tistical computation. The single tree approach—one is due to substantial cross-disciplinary work carried

Overview wires.wiley.com/compstats

out by Breiman. He described this general row the corresponding emphasis on sampling without
and column sampling approach as ‘Random replacement. With larger training data, sampling
Forests’;22 these are currently the leading benchmark without replacement tends to reinforce the adoption of
implementation of decision trees across a variety of differences in the model results. This is now recognized
statistical and machine learning applications. as a potential strength of multitree methods.
To date, most multitree methods demonstrate
strengths in various circumstances. As this field evolves
CONCLUSIONS it may become clear which method is best in which
set of circumstances. Given the pace of innovation in
There are many variations of multitree themes:
this area it is likely that improved methods and new
autonomous vs serial samples; row vs column
paradigms will continue to emerge.
reweighting schemes; replacement samples vs no
replacement; and so on. Improvements over best-
guess, single decision trees are shown in most multitree NOTE
methods. As training data continues to increase in a
This data table is based on the Titanic Passenger List
size there are now obvious benefits in the approach
edited by Michael A. Findlay, originally published in
of multiple autonomous trees as these trees can
Ref23 , and expanded with the help of the internet
be calculated independently, in parallel, prior to
community. The original HTML files were obtained
the production of an aggregate effect. As the size
by Philip Hind (1999).
of initial training data has increased, so too has

REFERENCES
1. Lima M. Visual Complexity: Mapping Patterns of Expert Systems in the Micro-electronic Age. Edinburgh:
Information. New York: Princeton Architectural Press; Edinburgh University Press; 1979, 168–201.
2011, 28.
12. Michie D, Sammut C. Controlling a black-box
2. http://lib.stat.cmu.edu/S/Harrell/data/descriptions/ simulation of a spacecraft. AI Mag 1991, 12:56–63.
titanic.html. (Accessed September 23, 2013).
13. Quinlan JR. C4.5: Programs for Machine Learning.
3. Sonquist JA, Baker EL, Morgan JN. Searching for New York: Morgan Kaufmann; 1988.
Structure. Ann Arbor, MI: Institute for Social Research;
1973. 14. Cohen, WW. Fast effective rule induction. Proceedings
of the Twelfth International Conference on Machine
4. Kass GV. An exploratory technique for investigating Learning; 1995, 115–123.
large quantities of categorical data. J R Stat Soc 1980,
29:119–127. 15. Automatic Boolean rule generation. Available at: http://
www.sas.com/text-analytics/text-miner/index.html.
5. Belson WA. A technique for studying the effects of (Accessed March 25, 2013).
television broadcast. J R Stat Soc 1956, 5:195.
16. Efron B. Bootstrap methods: another look at the
6. Belson WA. Matching and prediction on the principle
Jackknife. Ann Stat 1979, 7:1–26.
of biological classification. J R Stat Soc 1959, 8:65–75.
17. Schapire RE. The strength of weak learnability. Mach
7. Morgan JN, Sonquist JA. Problems in the analysis of
Learn 1990, 5:197–227.
survey data, and a proposal. J Am Stat Assoc 1963,
58:415–435. 18. Freund Y. Schapire RE. Experiments with a new
8. Hawkins DM, Kass GV. Automatic interaction detec- boosting algorithm. Proceedings of the Thirteenth
tion. In: Hawkins DM, ed. Topics in Applied Multivari- International Conference on Machine Learning, Bari,
ate Analysis. Cambridge: Cambridge University Press; Italy; 1996, 148–156.
1982. 19. Brieman L. Arcing classifiers. Ann Stat 1998,
9. Breiman L, Friedman JH, Olshen RA, Stone CJ. 26:801–849.
Classification and Regression Trees. London: Chapman 20. Friedman, HJ. Stochastic gradient boosting. 1999.
and Hall; 1984. Available at: http://www-stat.stanford.edu/∼jhf/ftp/
10. Hunt E, Marin J, Stone P. Experiments in Induction. stobst.ps.
New York: Academic Press; 1966. 21. Amit Y, Geman D. Shape quantization and recognition
11. Quinlan JR. Discovering rules by induction from with randomized trees. Neural Comput 1997,
large collections of examples. In: Michie D, ed. 9:1545–1588.

WIREs Computational Statistics Decision trees

22. Breiman L. Random forests, 2001. Available at http:// 23. Eaton JP, Haas CA. Titanic: Triumph and Tragedy,
oz.berkeley.edu/∼breiman/randomforest2001.pdf. Second Edition. New York: W.W. Norton & Company
(Accessed September 23, 2013). Inc; 1995.

FURTHER READING
Hawkins DM. Recursive partitioning. WIREs Comput Stat 2009, 1:290–295.
Loh WY. Classification and regression trees. WIREs Data Mining Knowl Discov 2011, 1:14–23.
de Ville B, Neville P. Decision Trees for Analytics Using SAS Enterprise Miner. Cary, NC: SAS Press; 2013.

Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Romanian Language Lessons PDF
50% (2)
Romanian Language Lessons PDF
14 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
9-Module 5 Decision Tree-21-03-2024
No ratings yet
9-Module 5 Decision Tree-21-03-2024
83 pages
Decision Tree Learning (8 Hours)
No ratings yet
Decision Tree Learning (8 Hours)
141 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
11 pages
WIREs Computational Stats - 2013 - de Ville - Decision Trees
No ratings yet
WIREs Computational Stats - 2013 - de Ville - Decision Trees
8 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Decision Tree
No ratings yet
Decision Tree
28 pages
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
Unit 3
No ratings yet
Unit 3
95 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
10 pages
Supervised Learning-Classification Part-4 Divide and Conquer
No ratings yet
Supervised Learning-Classification Part-4 Divide and Conquer
32 pages
D.el - Ed Syllabus
No ratings yet
D.el - Ed Syllabus
81 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Unit 3
No ratings yet
Unit 3
31 pages
Rf&DTfratello 2018
No ratings yet
Rf&DTfratello 2018
10 pages
Bhabesh - Chapter 3 Complete Editing Including Summary
No ratings yet
Bhabesh - Chapter 3 Complete Editing Including Summary
18 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Decisiontrees
No ratings yet
Decisiontrees
28 pages
ML Unit 2-2-40
No ratings yet
ML Unit 2-2-40
39 pages
ML Unit-3
No ratings yet
ML Unit-3
5 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
15 pages
Tips and Tricks Toefl
No ratings yet
Tips and Tricks Toefl
6 pages
Decision Tree DT
No ratings yet
Decision Tree DT
20 pages
Reading Skills Practice: A Train Timetable - Exercises: Preparation
100% (1)
Reading Skills Practice: A Train Timetable - Exercises: Preparation
3 pages
Springer - Linguistic Decision Trees For Classification-2014
No ratings yet
Springer - Linguistic Decision Trees For Classification-2014
43 pages
CW Week 5 Q2
No ratings yet
CW Week 5 Q2
3 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
2013 Facilitating Decision Support Through Decision Tree
No ratings yet
2013 Facilitating Decision Support Through Decision Tree
5 pages
Decision Trees: A Recent Overview: S. B. Kotsiantis
No ratings yet
Decision Trees: A Recent Overview: S. B. Kotsiantis
23 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Decision Trees
No ratings yet
Decision Trees
12 pages
Tle Css 9 Las 2nd Quarter 1
No ratings yet
Tle Css 9 Las 2nd Quarter 1
87 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Decision Tree Algorithm: and Classification Problems Too
No ratings yet
Decision Tree Algorithm: and Classification Problems Too
12 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Slide 3
No ratings yet
Slide 3
23 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
23 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
Module 2 Pursuit of Happiness
No ratings yet
Module 2 Pursuit of Happiness
104 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Trees and How To Build and Optimize Decision Tree Classifier
No ratings yet
Decision Trees and How To Build and Optimize Decision Tree Classifier
16 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Human Environment System
100% (1)
Human Environment System
10 pages
Rubric Star Observation
No ratings yet
Rubric Star Observation
2 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Is Valid Only With Original Photo ID: Railway Recruitment Board
No ratings yet
Is Valid Only With Original Photo ID: Railway Recruitment Board
2 pages
Wapda Tar Form
No ratings yet
Wapda Tar Form
5 pages
My Special Educations Portfolio
No ratings yet
My Special Educations Portfolio
18 pages
Pe01 - Lesson 1
No ratings yet
Pe01 - Lesson 1
48 pages
BFB 40903 - Test 1 - Latest Semakan Nov 2018
No ratings yet
BFB 40903 - Test 1 - Latest Semakan Nov 2018
3 pages
Module 2 Business & Entrepreneurship
No ratings yet
Module 2 Business & Entrepreneurship
47 pages
The Best Interests of The Child in An Immigration Law Context
No ratings yet
The Best Interests of The Child in An Immigration Law Context
79 pages
Management Information System-Overview
No ratings yet
Management Information System-Overview
3 pages
Teenage Pregnancy
No ratings yet
Teenage Pregnancy
5 pages
2022 Judicial Performance Review
No ratings yet
2022 Judicial Performance Review
41 pages
Library Management Ssytem
No ratings yet
Library Management Ssytem
3 pages
Intelligence: Heiner Rindermann, Quyen Sen Ngoc Hoang, Antonia E.E. Baumeister
No ratings yet
Intelligence: Heiner Rindermann, Quyen Sen Ngoc Hoang, Antonia E.E. Baumeister
12 pages
Exercises For Students With Solutions
No ratings yet
Exercises For Students With Solutions
15 pages
The Power of Learning Through Poetry
No ratings yet
The Power of Learning Through Poetry
10 pages
Expressive Art
No ratings yet
Expressive Art
1 page
In What Ways Do You Think College Will Be Most Rewarding For You As A Student?
No ratings yet
In What Ways Do You Think College Will Be Most Rewarding For You As A Student?
2 pages
Lahari Dhanasi-1
No ratings yet
Lahari Dhanasi-1
2 pages
Assessment 03 - MCQ On Lessons L9 (Methods Upto Group Discussion) Oct 2024
No ratings yet
Assessment 03 - MCQ On Lessons L9 (Methods Upto Group Discussion) Oct 2024
6 pages
Science Month - Competition Winners
No ratings yet
Science Month - Competition Winners
2 pages
Process in Digestive System 7es LP
No ratings yet
Process in Digestive System 7es LP
4 pages
G2 Term 3 Exam Schedule and Pointers
No ratings yet
G2 Term 3 Exam Schedule and Pointers
2 pages
Data Insights: The Science of Data Analysis
From Everand
Data Insights: The Science of Data Analysis
Lexa N. Palmer
No ratings yet

Decision Trees

Uploaded by

Decision Trees

Uploaded by

Overview

How to cite this article:

Keywords: decision trees; rule induction; predictive models; machine learning;

INTRODUCTION artificial intelligence, machine learning, knowledge

448 © 2013 Wiley Periodicals, Inc. Volume 5, November/December 2013

Node ID: 23 Node ID: 24

<30.75 or missing >=30.75 <9.5 >=9.5 or missing

Node ID: 30 Node ID: 31 Node ID: 33 Node ID: 34

FIGURE 1 | A decision tree illustrating analysis of survival in Titanic sinking

Volume 5, November/December 2013 © 2013 Wiley Periodicals, Inc. 449

450 © 2013 Wiley Periodicals, Inc. Volume 5, November/December 2013

influencial computer implementations of decision • The automatic treatment of various levels of

Volume 5, November/December 2013 © 2013 Wiley Periodicals, Inc. 451

452 © 2013 Wiley Periodicals, Inc. Volume 5, November/December 2013

Volume 5, November/December 2013 © 2013 Wiley Periodicals, Inc. 453

454 © 2013 Wiley Periodicals, Inc. Volume 5, November/December 2013

Volume 5, November/December 2013 © 2013 Wiley Periodicals, Inc. 455

You might also like