0% found this document useful (0 votes)

94 views45 pages

Cart: Classification and Regression Tree

The document discusses Classification and Regression Trees (CART), a non-parametric decision tree learning technique. CART can be used for classification and regression tasks. It builds decision trees from a set of training data in a top-down manner to predict target variables by splitting nodes based on input variables. The document provides two examples of CART analyses and outlines the key components, features, and process for building, pruning, and selecting the optimal CART model.

Uploaded by

cahyadi aditya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views45 pages

Cart: Classification and Regression Tree

Uploaded by

cahyadi aditya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 45

CART: CLASSIFICATION

AND REGRESSION
TREE
CART: Classification and Regression Tree
Motivation:
• Development of a reliable clinical decision rule, which can
be used to classify new patients into clinically-important
categories or risk categories so that appropriate decisions
can be made regarding patient management.
CART: Classification and Regression Tree
Example 1:
• Molecular abnormalities in the major psychiatric illnesses:
Classification and Regression Tree (CRT) analysis of
post-mortem prefrontal markers.
M B Knable et al
Molecular Psychiatry, 2002, Volume 7, Number 4, Pages
392-404.
CART: Classification and Regression Tree
• Post-mortem specimens from the Stanley Foundation
Neuropathology Consortium, which contains matched
samples from patients with schizophrenia, bipolar
disorder, non-psychotic depression and normal controls (n
= 15 per group), have been distributed to many research
groups around the world. This paper provides a summary
of abnormal markers found in prefrontal cortical areas
from this collection between 1997 and 2001. With
parametric analyses of variance of 102 separate data
sets, 14 markers were abnormal in at least one disease.
CART: Classification and Regression Tree
• The markers pertained to a variety of neural systems and
processes including neuronal plasticity,
neurotransmission, signal transduction, inhibitory
interneuron function and glial cells. The data sets were
also examined using the non-parametric Classification
and Regression Tree (CRT) technique for the four
diagnostic groups and in pair-wise combinations. In
contrast to the results obtained with analyses of variance,
the CRT method identified a smaller set of nine markers
that contributed maximally to the diagnostic
classifications.
CART: Classification and Regression Tree
• Three of the nine markers observed with CRT overlapped
with the ANOVA results. Six of the nine markers observed
with the CRT technique pertained to aspects of
glutamatergic, GABA-ergic, and dopaminergic
neurotransmission.
CART: Classification and Regression Tree
Example 2:
• Sperm morphology, motility, and concentration in fertile
and infertile men.
Guzick, DS; Overstreet, JW; Factor-Litvak, P, et al.
New England Journal of Medicine, 2001, Volume 345,
Issue 19, Pages 1388-1393.
CART: Classification and Regression Tree
Background
• Although semen analysis is routinely used to evaluate the
male partner in infertile couples, sperm measurements
that discriminate between fertile and infertile men are not
well defined.
CART: Classification and Regression Tree
Methods
• We evaluated two semen specimens from each
of the male partners in 765 infertile couples and
696 fertile couples at nine sites. The female
partners in the infertile couples had normal
results on fertility evaluation. The sperm
concentration and motility were determined at the
sites; semen smears were stained at the sites
and shipped to a central laboratory for an
assessment of morphologic features of sperm
with the use of strict criteria.
CART: Classification and Regression Tree
• We used classification-and-regression-tree analysis to
estimate threshold values for subfertility and fertility with
respect to the sperm concentration, motility, and
morphology. We also used an analysis of receiver-
operating-characteristic curves to assess the relative
value of these sperm measurements in discriminating
between fertile and infertile men.
CART: Classification and Regression Tree
Results
• The subfertile ranges were a sperm concentration
of less than 13.5x106 per milliliter, less than 32
percent of sperm with motility, and less than 9
percent with normal morphologic features. The
fertile ranges were a concentration of more than
48.0x106 per milliliter, greater than 63 percent
motility, and greater than 12 percent normal
morphologic features. Values between these
ranges indicated indeterminate fertility.
CART: Classification and Regression
Tree
• There was extensive overlap between the fertile and the
infertile men within both the subfertile and the fertile
ranges for all three measurements. Although each of the
sperm measurements helped to distinguish between
fertile and infertile men, none was a powerful
discriminator. The percentage of sperm with normal
morphologic features had the greatest discriminatory
power.
CART: Classification and Regression
Tree
Conclusions
• Threshold values for sperm concentration, motility, and
morphology can be used to classify men as subfertile, of
indeterminate fertility, or fertile. None of the measures,
however, are diagnostic of infertility.
CART: Classification and Regression
Tree
Components of classification problem:
1. Outcome or “dependent” variable. They can
be continuous, categorical (ordinal or nominal), or
time-to event variables, e.g.:
a) Example 1:
b) Example 2:
c) Blood pressure, Patient survival, need for
surgery, presence of myocardial infarction,
medication compliance:
CART: Classification and Regression
Tree
2. Predictor or independent variables
a) Example 1:
b) Example 2:
c) Blood, Patient survival, need for surgery, presence of myocardial
infarction, medication compliance:
CART: Classification and Regression
Tree
3. Learning data set: this is a dataset which includes
values for both the outcome and predictor variables,
from a group of patients similar to those for whom we
would like to be able to predict outcomes in the future.
CART: Classification and Regression
Tree
4. Test data set: it consists of patients for whom we would
like to be able to make accurate predictions. This test
dataset may or may not exist in practice. However, a
separate test dataset is not always required to
determine the performance of a decision rule.
CART: Classification and Regression
Tree
• A decision problem can include two other factors
to be considered:
a “prior” probability for each outcome, which
represents the probability that a randomly-
selected future patient will have a particular
outcome.
decision cost or loss function, which represents
the inherent cost associated with the error in
prediction:
CART: Classification and Regression
Tree
• For example, it is a much more serious error to classify a
patient with an emergent medical condition as non-urgent,
than to misclassify a patient with a non-urgent medical
condition as urgent.
CART: Classification and Regression
Tree
Features of CART:
1. Nodes: parent node, child node, root node, terminal
node.
2. Binary Splits: a “node” in a decision tree, can only be
split into two groups
CART: Classification and Regression
Tree
3. Each Split based on only one variable.
4. Recursive partition: binary partitioning process can be
applied over and over again. Each parent node can
give rise to two child nodes and, in turn, each of these
child nodes may themselves be split, forming
additional children.
CART: Classification and Regression
Tree
To construct a CART:
1. Tree building: a tree is built using recursive
splitting of nodes.
2. Build a “maximal” tree based on some
‘stopping rule’: a “maximal” tree has been
produced, which probably greatly overfits the
information contained within the learning
dataset.
3. Optimal tree selection: the tree which fits the
information in the learning dataset, but does
not overfit the information, is selected from
among the sequence of pruned trees.
CART: Classification and Regression
Tree
TREE BUILDING:
1. Tree building begins at the root node, which includes all
patients in the learning dataset. Beginning with this
node, the CART software finds the best possible
variable to split the node into two child nodes. In order
to find the best variable, the software checks all
possible splitting variables (called splitters), as well as
all possible values of the variable to be used to split the
node.
2. In choosing the best splitter, the program seeks to
minimize the total “purity” of the two child nodes.
CART: Classification and Regression
Tree
3. Measure of impurity of a node: p= (p1, p2, p3,
…, pk)
a. Information, or Entropy: E= Sum (p log(p)),
with 0log0 = 0
b. Gini Index: G=1-Sum (p*p)
4. We define the impurity of a tree to be the sum
over all terminal nodes of the impurity of a
node multiplied by the proportion of cases that
reach that node of the tree
CART: Classification and Regression
Tree
5. The predicted class assigned to each terminal node
depends on three factors:
1) Assumed prior probability of each class within future
datasets;
2) Decision loss or cost matrix; and
3) Fraction of subjects with each outcome in the learning
dataset that end up in each node.
CART: Classification and Regression
Tree
Stop Tree Building
The tree building process goes on until it is impossible to
continue. The process is stopped when:
1. All observations within each child node have the
identical distribution of predictor variables, making
splitting impossible.
CART: Classification and Regression
Tree
2. An external limit on the number of levels in the
maximal tree has been set by the user.
3. An external limit on the size of the terminal nodes in
the maximal tree has been reached.
CART: Classification and Regression
Tree
Tree Pruning
• In order to generate a sequence of simpler and
simpler trees, each of which is a candidate for
the appropriately-fit final tree, the method of
Reduced Error Pruning can be used. Each time
we remove the ‘weakest link’ that provides least
improvement in misclassification. The
misclassification error gradually increased during
the pruning process.
• More generally pruning is possible: cost
complexity pruning.
CART: Classification and Regression
Tree
Optimal Tree Selection
• The maximal tree will always fit the learning dataset with
higher accuracy than any other tree, because the maximal
tree is constructed to optimize its performance based on
the learning dataset.
CART: Classification and Regression
Tree
• The goal in selecting the optimal tree, defined
with respect to expected performance on an
independent set of data, is to find the correct
complexity parameter so that the information
in the learning dataset is fit but not overfit. In
general, finding this value for would require an
independent set of data, but this requirement can
be avoided using the technique of cross
validation
CART: Classification and
Regression Tree
• The figure below shows the relationship between tree
complexity, reflected by the number of terminal nodes,
and the decision cost for an independent test dataset
and the original learning dataset.
CART: Classification and
Regression Tree
• As the number of nodes increases, the decision
cost decreases monotonically for the learning
data. This corresponds to the fact that the
maximal tree will always give the best fit to the
learning dataset. In contrast, the expected cost
for an independent dataset reaches a minimum,
and then increases as the complexity increases.
This reflects the fact that an overfitted and overly
complex tree will not perform well on a new set of
data.
CART: Classification and Regression
Tree
Cross-Validation
• Cross validation is a computationally-intensive
method for validating a procedure for model
building, which avoids the requirement for a new
or independent validation dataset. In cross
validation, the learning dataset is randomly split
into N sections. One of these subsets of data is
reserved for use as an independent test dataset,
while the other N-1 subsets are combined for use
as the learning dataset in the model-building
procedure.
CART: Classification and
Regression Tree
• The entire model-building procedure is
repeated N times, with a different subset of the
data reserved for use as the test dataset each
time. Thus, N different models are produced,
each one of which can be tested against an
independent subset of the data. The amazing
fact on which cross validation is based is that
the average performance of these N models is
an excellent estimate of the performance of
the original model (produced using the entire
learning dataset) on a future independent set
of patients.
CART: Classification and Regression
Tree
Advantage:
No parametric assumption (in contrast to linear regression,
logistic regression, or cox’s proportional hazard model)
• Can cope with any data type (continuous, binary, ordinal,
nominal)
• Classification has a simple form that’s easy to understand
CART: Classification and Regression
Tree
• CART identifies “splitting” variables based on an
exhaustive search of all possibilities. Since efficient
algorithms are used, CART is able to search all possible
variables as splitters, even in problems with many
hundreds of possible predictors.
• It handles complex interactions well. For example, the
value of one variable (e.g., age) may substantially affect
the importance of another variable (e.g., weight).
• Is robust with respect to outliers.
• Provides an estimate of the misclassification rate.
CART: Classification and Regression
Tree
Disadvantage:
• CART does not use combinations of variable in each split
• Tree structures may be unstable – a change in the sample
may give different trees
• Tree is optimal at each split – it may not be globally
optimal.
CART: Classification and Regression
Tree
Reference:
•1. Classification and regression trees
Leo Breiman, Jerome H. Friedman, Richard Olshen, and
Charles J. Stone
Brooks/Cole Publishing, Monterey, 1984
•2. An Introduction to Classification and Regression Tree (CART)
Analysis
Roger J. Lewis
Presented at the 2000 Annual Meeting of the Society for
Academic Emergency Medicine in San Francisco, California
•3. Modern Applied Statistics with S
WN Venables, BD Ripley
Springer 2002

Unit V (Veterinary Chemotherapy)
No ratings yet
Unit V (Veterinary Chemotherapy)
115 pages
ML Unit 3
No ratings yet
ML Unit 3
49 pages
The History of Biological Science
100% (1)
The History of Biological Science
27 pages
An Introduction To Classification and Regression Tree
100% (1)
An Introduction To Classification and Regression Tree
15 pages
Laboratory Technology Course Catalogue Last
No ratings yet
Laboratory Technology Course Catalogue Last
9 pages
Derma 101
No ratings yet
Derma 101
493 pages
Ashu John IV
50% (2)
Ashu John IV
5 pages
Xin Ma - Using Classification and Regression Trees - A Practical Primer-Information Age Publishing (2018)
No ratings yet
Xin Ma - Using Classification and Regression Trees - A Practical Primer-Information Age Publishing (2018)
166 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
2.02 Classes of Microorganisms
100% (1)
2.02 Classes of Microorganisms
16 pages
Business Data Mining Week 11
No ratings yet
Business Data Mining Week 11
15 pages
6 Chromosome Mutations Variation in Number and Arrangement
100% (2)
6 Chromosome Mutations Variation in Number and Arrangement
12 pages
Up M PHD Seminar Cart RF May 2023
No ratings yet
Up M PHD Seminar Cart RF May 2023
101 pages
Data Mining
No ratings yet
Data Mining
18 pages
Acute Leukemia Best Article
100% (1)
Acute Leukemia Best Article
8 pages
AST Day 3 Slides
No ratings yet
AST Day 3 Slides
79 pages
Ch13. Decision Tree: KH Wong
No ratings yet
Ch13. Decision Tree: KH Wong
82 pages
1.0 - Introduction To Human Body
No ratings yet
1.0 - Introduction To Human Body
44 pages
Marital Counselling
No ratings yet
Marital Counselling
5 pages
Classification and Regression Trees (CART) Theory and Applications
No ratings yet
Classification and Regression Trees (CART) Theory and Applications
40 pages
Bone Changes During Orthodontic Treatment
No ratings yet
Bone Changes During Orthodontic Treatment
39 pages
CART Algorithm
No ratings yet
CART Algorithm
65 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Cartfromatob: James Guszcza, Fcas, Maaa
No ratings yet
Cartfromatob: James Guszcza, Fcas, Maaa
54 pages
Fisiopatologia e Tratamento (3) ..
No ratings yet
Fisiopatologia e Tratamento (3) ..
19 pages
Classification Tree - Utkarsh Kulshrestha: Earn in G Is in Learnin G - Utkarsh Kulshrestha
No ratings yet
Classification Tree - Utkarsh Kulshrestha: Earn in G Is in Learnin G - Utkarsh Kulshrestha
33 pages
Decision TREE
100% (1)
Decision TREE
3 pages
NUR1011 Concepts of Health Week 6: Revision of Important Concepts
No ratings yet
NUR1011 Concepts of Health Week 6: Revision of Important Concepts
31 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
Financial Applications of Classification and Regr
No ratings yet
Financial Applications of Classification and Regr
41 pages
CART Algo
No ratings yet
CART Algo
36 pages
Decision Trees: Make A Decision (Represent An Outcome
No ratings yet
Decision Trees: Make A Decision (Represent An Outcome
4 pages
Introducing Decision Theory Analysis (DTA) and Classification and Regression Trees (CART)
No ratings yet
Introducing Decision Theory Analysis (DTA) and Classification and Regression Trees (CART)
30 pages
Classification and Regression Trees CART
No ratings yet
Classification and Regression Trees CART
40 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
37 pages
PR GTU IMP Questions by Jay
No ratings yet
PR GTU IMP Questions by Jay
35 pages
Week 7
No ratings yet
Week 7
32 pages
Chapter 09 CART - N
No ratings yet
Chapter 09 CART - N
24 pages
Bonfiglio - Brooke - 11100436 - AP Bio Stickler Syndrome
No ratings yet
Bonfiglio - Brooke - 11100436 - AP Bio Stickler Syndrome
11 pages
Dadm s16 Cart
No ratings yet
Dadm s16 Cart
18 pages
CART - Machine Learning
No ratings yet
CART - Machine Learning
29 pages
An Overview of Data Mining2016-PAKTAUFIK-MPK
No ratings yet
An Overview of Data Mining2016-PAKTAUFIK-MPK
29 pages
Decistion Tree
No ratings yet
Decistion Tree
27 pages
ML 20
No ratings yet
ML 20
24 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Data Science Concepts Lesson04 Decision Tree Concepts
No ratings yet
Data Science Concepts Lesson04 Decision Tree Concepts
22 pages
Presentation On Decision Tree
No ratings yet
Presentation On Decision Tree
39 pages
Objective Segmentation
No ratings yet
Objective Segmentation
21 pages
Data Warehousing and Data Mining: Classification, Trees
No ratings yet
Data Warehousing and Data Mining: Classification, Trees
26 pages
Peripheral Neuropathic Pain
No ratings yet
Peripheral Neuropathic Pain
18 pages
Cart
No ratings yet
Cart
19 pages
ExamView - SBI3C Circulatory System NO SHORT ANSWERS
No ratings yet
ExamView - SBI3C Circulatory System NO SHORT ANSWERS
7 pages
Catalase Test 112552
No ratings yet
Catalase Test 112552
15 pages
An Introduction To Classification and Regression Tree (CART) Analysis
No ratings yet
An Introduction To Classification and Regression Tree (CART) Analysis
15 pages
Result: Uncertain: Variant(s) of Uncertain Significance Identified
No ratings yet
Result: Uncertain: Variant(s) of Uncertain Significance Identified
11 pages
Dinesh Kumar Indra Panwar Arjan Singh
No ratings yet
Dinesh Kumar Indra Panwar Arjan Singh
19 pages
Cart PDF
No ratings yet
Cart PDF
12 pages
Cape 2 Biology - Homeostasis &excretion
No ratings yet
Cape 2 Biology - Homeostasis &excretion
9 pages
6BI02 Mark Scheme Jan 2009
No ratings yet
6BI02 Mark Scheme Jan 2009
14 pages
Morgan Cart
No ratings yet
Morgan Cart
16 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Cart Introduction Beamer
No ratings yet
Cart Introduction Beamer
18 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
Understanding of Working of DECISION TREE CART Algorithm
No ratings yet
Understanding of Working of DECISION TREE CART Algorithm
15 pages
Presented by Elden 18mca514
No ratings yet
Presented by Elden 18mca514
15 pages
Skizofrenia Simplex
No ratings yet
Skizofrenia Simplex
8 pages
Script Mermaid-2
No ratings yet
Script Mermaid-2
9 pages
Pathology - Wikipedia
No ratings yet
Pathology - Wikipedia
15 pages
Genetic Factors Can Influence The Behavior of A Person: BY Shaib
No ratings yet
Genetic Factors Can Influence The Behavior of A Person: BY Shaib
12 pages
Comparative Endocrinology in The 21st Century
No ratings yet
Comparative Endocrinology in The 21st Century
11 pages
Dna Isolation.
No ratings yet
Dna Isolation.
10 pages
CART
No ratings yet
CART
8 pages
Cart
No ratings yet
Cart
7 pages
Understanding CART Classification and Regression Trees
No ratings yet
Understanding CART Classification and Regression Trees
7 pages
Exercise 4
No ratings yet
Exercise 4
3 pages
Models: Regularization Is Used To Prevent Overfitting by Adding A Penalty To The Model's Complexity. 1.
No ratings yet
Models: Regularization Is Used To Prevent Overfitting by Adding A Penalty To The Model's Complexity. 1.
6 pages
Untitled Presentation
No ratings yet
Untitled Presentation
6 pages
Untitled Presentation
No ratings yet
Untitled Presentation
6 pages
Polymerase Chain Reaction-Based Detection of Lymphatic Filariasis
No ratings yet
Polymerase Chain Reaction-Based Detection of Lymphatic Filariasis
6 pages
AQA Immunity Booklet
No ratings yet
AQA Immunity Booklet
6 pages
An Approach For Classification Using Simple CART Algorithm in Weka
No ratings yet
An Approach For Classification Using Simple CART Algorithm in Weka
5 pages
Anemia S
No ratings yet
Anemia S
5 pages
Application of Cart Algorithm in Hepatitis Diseaseas Diagnosis
No ratings yet
Application of Cart Algorithm in Hepatitis Diseaseas Diagnosis
5 pages
Discuss The Concept of Pruning in Decision Trees and Its Role in Preventing Overfitting
No ratings yet
Discuss The Concept of Pruning in Decision Trees and Its Role in Preventing Overfitting
3 pages
CART Regression Model
No ratings yet
CART Regression Model
2 pages
1
No ratings yet
1
2 pages
A4 - Q Clone Overview - 01-2024
No ratings yet
A4 - Q Clone Overview - 01-2024
2 pages
Decision Tree
No ratings yet
Decision Tree
2 pages

Cart: Classification and Regression Tree

Uploaded by

Cart: Classification and Regression Tree

Uploaded by

CART: CLASSIFICATION

You might also like