Classification With Decision Trees

Decision trees are nonparametric algorithms used for classification and regression, structured as a series of decision nodes and branches leading to leaf nodes. They require prelabelled training data and aim to create pure leaf nodes, with algorithms like CART and C4.5 providing methods for constructing and optimizing these trees. The interpretability of decision trees allows for the generation of decision rules based on paths from root to leaf nodes, with measures of support and confidence indicating the reliability of these rules.

Uploaded by

hrithiks435

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views29 pages

Classification With Decision Trees

Uploaded by

hrithiks435

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Classification

with
Decision Trees
What is a Decision Tree
• A set of Nonparametric Algorithms primarily for classification but equally powerful for regression problems
too
• Decision trees help in devising rules for classification and regression
• A decision tree is a collection of decision nodes, connected by branches, extending downward from the root
node until terminating in leaf nodes
• Beginning at the root node, which by convention is placed at the top of the decision tree diagram, attributes
are tested at the decision nodes, with each possible outcome resulting in a branch
• Each branch then leads either to another decision node or to a terminating leaf node
• An example of classifying the potential customers as Bad Credit Risk or Good Credit Risk based upon the
attributes of Savings (Low, Medium, High), Assets (Low or Not Low) and Income (>=$30k or <$30k)
What is a Decision Tree
• An example of classifying the potential customers as Bad Credit Risk or Good Credit Risk based upon the
attributes of Savings (Low, Medium, High), Assets (Low or Not Low) and Income (>=$30k or <$30k)
A Simple Decision Tree
Mixed Target Values with Same Attribute Values

• Here, as all customers have the same predictor values, there is no possible way to split the records according
to the predictor variables that will lead to a pure leaf
• Therefore, such nodes become diverse leaf nodes, with mixed values for the target attribute
• In this case, the decision tree may report that the classification for such customers is “bad,” with 60%
confidence, as determined by the three-fifths of customers in this node who are bad credit risks
Requirements of Decision Tree
• Decision tree algorithms represent supervised learning, and as such require prelabelled target variables.
• A training data set must be supplied, which provides the algorithm with the values of the target variable
• This training data set should be rich and varied, providing the algorithm with a healthy cross section of the
types of records for which classification may be needed in the future
• Decision trees learn by example, and if examples are systematically lacking for a definable subset of
records, classification and prediction for this subset will be problematic or impossible
• The target attribute classes must be discrete That is, one cannot apply decision tree analysis to a continuous
target variable.
• the target variable must take on values that are clearly demarcated as either belonging to a particular class
or not belonging
Questions on Decision Tree Construction
• Why, in the example above, did the decision tree choose the savings attribute for the root node split? Why did
it not choose assets or income instead?
• Decision trees seek to create a set of leaf nodes that are as “pure” as possible; that is, where each of the
records in a particular leaf node has the same target value
• How does one measure uniformity, or conversely, how does one measure heterogeneity?
• We shall examine two of the many methods for measuring leaf node purity, which lead to the following
two leading algorithms for constructing decision trees
• CART
• C4.5
CART-Classification and Regression Trees
• The CART method was suggested by Breiman et al in 1984
• The decision trees produced by CART are strictly binary, containing exactly two branches for each decision
node
• CART recursively partitions the records in the training data set into subsets of records with similar values for
the target attribute
• The CART algorithm grows the tree by conducting for each decision node, an exhaustive search of all
available variables and all possible splitting values, selecting the optimal split according to the following
criteria
• Let Φ(s|t) be a measure of the “goodness” of a candidate split s at node t, where
CART
• Then the optimal split is whichever split maximizes this measure Φ(s|t) over all possible splits at node t
• Suppose that we have the training data set shown in Table and are interested in using CART to build a
decision tree for predicting whether a particular customer should be classified as being a good or a bad credit
risk
CART
• Possible candidate splits for the root node
CART
• For each candidate split, let us examine the values of the various components of the optimality measure Φ(s|t)
CART
• CART decision tree after initial split
CART
• Splits, for decision node A (best performance highlighted)
CART
• CART decision tree after decision node A split
CART
• CART decision tree, fully grown form
Classification Error
• Not all dataset produces a tree with pure or homogeneous or uniform leaf nodes

• Dataset with same attribute conditions but different class label may occur frequently - leads to a certain level
of classification error
• For example, suppose that, as we cannot further partition the records in table we classify the records contained
in this leaf node as bad credit risk
• Then the probability that a randomly chosen record from this leaf node would be classified correctly is 0.6,
because three of the five records (60%) are actually classified as bad credit risks
• Hence, our classification error rate for this particular leaf would be 0.4 or 40%, because two of the five
records are actually classified as good credit risks
Classification Error
• CART would then calculate the error rate for the entire decision tree to be the weighted average of the
individual leaf error rates, with the weights equal to the proportion of records in each leaf
• To avoid memorizing the training set, the CART algorithm needs to begin pruning nodes and branches that
would otherwise reduce the generalizability of the classification results
• Even though the fully grown tree has the lowest error rate on the training set, the resulting model may be too
complex, resulting in overfitting
• As each decision node is grown, the subset of records available for analysis becomes smaller and less
representative of the overall population
• Pruning the tree will increase the generalizability of the results
• Essentially, an adjusted overall error rate is found that penalizes the decision tree for having too many leaf
nodes and thus too much complexity
C4.5
• The C4.5 algorithm is Quinlan’s extension of his own iterative dichotomizer 3 (ID3) algorithm for generating
decision tree
• Unlike CART, the C4.5 algorithm is not restricted to binary splits. Whereas CART always produces a binary
tree, C4.5 produces a tree of more variable shape
• The C4.5 method for measuring node homogeneity is quite different from the CART method
• The C4.5 algorithm uses the concept of information gain or entropy reduction to select the optimal split
• Suppose that we have a variable X whose k possible values have probabilities p1, p2, … , pk. What is the
smallest number of bits, on average per symbol, needed to transmit a stream of symbols representing the
values of X observed?
• The answer is called the entropy of X and is defined as
C4.5
• C4.5 uses this concept of entropy as follows. Suppose that we have a candidate split S, which partitions the
training data set T into several subsets, T1, T2, … , Tk
• The mean information requirement can then be calculated as the weighted sum of the entropies for the
individual subsets, as follows where Pi represents the proportion of records in subset i (For saving 3
subsets-Low, Medium, High)

• We may then define our information gain to be gain(S) = H(T) - HS(T), that is, the increase in information
produced by partitioning the training data T according to this candidate split S
• At each decision node, C4.5 chooses the optimal split to be the split that has the greatest information gain,
gain(S)
C4.5
• Let us look at the same data again
• Possible candidate splits at root node are given

• Now, because five of the eight records are classified as good credit risk, with the remaining three records
classified as bad credit risk, the entropy before splitting is
C4.5
• For candidate split 1 (savings), two of the records have high savings, three of the records have medium
savings, and three of the records have low savings, so we have Phigh = 2⁄8, Pmedium = 3⁄8, Plow = 3⁄8
• For high saving one is Good and one is Bad credit risk so, entropy is

• For medium saving, entropy is

• For low saving, entropy is
• So,
• Information Gain is
• In this way for every possible root node attribute find out the information gain
• 0log20 = 0 is used in entropy calculation
C4.5
• Select that attribute to split with highest information gain

• Here, Assets is to be split first at the root node same as CART algorithm
C4.5
• After splitting according to Assets
C4.5
• The initial split has resulted in the creation of two terminal leaf nodes and one new decision node – Leaf
nodes are Low (2 Bad) and High (2 good)
• Assets medium is containing 3 good and 1 bad so, needs to be split
• Entropy before splitting

• Candidate splits for the Assets medium node

C4.5
• Remaining records are given here

• For split 1 Savings Plow =1/4, Pmedium = 2/4, Phigh=1/4 and Hlow=0, Hmedium=0 and Hhigh=0. So, information gain
is = 0.8133 – 0 = 0.8133
• For split 2 Income <=25k and >25k also gives this same information gain eventually which is maximum
• So, arbitrarily, Savings is chosen as next attribute to split
C4.5
• The final decision tree is

• Finally, once the decision tree is fully grown, C4.5 engages in pessimistic postpruning
Decision Rules
• Interesting feature of Decision Tree models is their interpretability
• Decision rules can be constructed from a decision tree simply by traversing any given path from the root node
to any leaf
• Decision rules come in the form if antecedent, then consequent
• The support of the decision rule refers to the proportion of records in the data set that rest in that particular
terminal leaf node
• The confidence of the rule refers to the proportion of records in the leaf node for which the decision rule is
true
• In this small example, all of our leaf nodes are pure, resulting in perfect confidence levels of 100% = 1.00 but
in reality, leaf nodes are generally nonuniform
Support and Confidence Measure Values
• For the Tree generated using C4.5 the values of support and confidence measures are (Measure of Importance of
the discovered rule)

• Support of a decision rule refers to the proportion of records in the dataset that rest in a particular terminal leaf node

• Confidence of the rule refers to the proportion of records in the leaf node for which the decision rule is true

Player's Handbook (2014) - PDF
0% (1)
Player's Handbook (2014) - PDF
485 pages
Batch Control IsA 9 21 2010
100% (2)
Batch Control IsA 9 21 2010
52 pages
3 Com
No ratings yet
3 Com
465 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
UNIT 2 - Groups (Decision Tree)
No ratings yet
UNIT 2 - Groups (Decision Tree)
20 pages
Classification
No ratings yet
Classification
75 pages
Decision Tree DT
No ratings yet
Decision Tree DT
20 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
15.module6 Decisiontree-Updated 14
No ratings yet
15.module6 Decisiontree-Updated 14
20 pages
DM Mod 3
No ratings yet
DM Mod 3
14 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
NOTES
No ratings yet
NOTES
18 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Tree
No ratings yet
Tree
7 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Lec4 - Decision Trees
No ratings yet
Lec4 - Decision Trees
43 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
15 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
Decision Tree
No ratings yet
Decision Tree
4 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Decision Trees: Make A Decision (Represent An Outcome
No ratings yet
Decision Trees: Make A Decision (Represent An Outcome
4 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Trees
No ratings yet
Trees
78 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Decision Trees
No ratings yet
Decision Trees
77 pages
10.1 Decision Tree
No ratings yet
10.1 Decision Tree
17 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
CART - Machine Learning
No ratings yet
CART - Machine Learning
29 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
Unit 1 Classification & Prediction DM
No ratings yet
Unit 1 Classification & Prediction DM
71 pages
Cours #4-Decision Tree
No ratings yet
Cours #4-Decision Tree
18 pages
Decision Trees Lectures
No ratings yet
Decision Trees Lectures
55 pages
Optimization of C4.5 Decision Tree Algorithm For Data Mining Application
No ratings yet
Optimization of C4.5 Decision Tree Algorithm For Data Mining Application
5 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Syllabus
No ratings yet
Syllabus
2 pages
Classification With Logistic Regression: DR Sandipan Karmakar Mnit Jaipur
No ratings yet
Classification With Logistic Regression: DR Sandipan Karmakar Mnit Jaipur
54 pages
Introduction To Business Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
No ratings yet
Introduction To Business Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
37 pages
Fourth Brand Management
No ratings yet
Fourth Brand Management
17 pages
My Link Building Recommendations
100% (1)
My Link Building Recommendations
2 pages
PFA Chemical Resistance Chart
No ratings yet
PFA Chemical Resistance Chart
8 pages
647e1269017b6a1e238c38b6 EIR2023-Ethiopia
No ratings yet
647e1269017b6a1e238c38b6 EIR2023-Ethiopia
27 pages
Kateen Menu
No ratings yet
Kateen Menu
2 pages
672448fa583fcf7e75908848 43302953161
No ratings yet
672448fa583fcf7e75908848 43302953161
2 pages
Analog Display Digital VFO
No ratings yet
Analog Display Digital VFO
3 pages
Opportunities For Nys Graduates May 2,2025
No ratings yet
Opportunities For Nys Graduates May 2,2025
4 pages
5 Diego V Castillo
No ratings yet
5 Diego V Castillo
2 pages
Muhammad Okasha - KFUPM GRADUATE PROGRAMS - Online Application - Regular Programs - 241
No ratings yet
Muhammad Okasha - KFUPM GRADUATE PROGRAMS - Online Application - Regular Programs - 241
8 pages
Milk Powder: Etc., Recombined Milks and Other Liquid Beverages
No ratings yet
Milk Powder: Etc., Recombined Milks and Other Liquid Beverages
5 pages
JHU Intro Syl Fall 2015
No ratings yet
JHU Intro Syl Fall 2015
7 pages
The Role of Chittagong Port in The Economy of Bangladesh II
100% (2)
The Role of Chittagong Port in The Economy of Bangladesh II
15 pages
Backgroud of Malaysia Airlines 1
No ratings yet
Backgroud of Malaysia Airlines 1
38 pages
Netbackup Troubleshooting Commands
No ratings yet
Netbackup Troubleshooting Commands
4 pages
SW 4048 120 Spec Sheet
No ratings yet
SW 4048 120 Spec Sheet
2 pages
Basic Firefighting Course
No ratings yet
Basic Firefighting Course
15 pages
R&S ESW User Manual en 01
No ratings yet
R&S ESW User Manual en 01
828 pages
Grade 10 - Unit 01
No ratings yet
Grade 10 - Unit 01
2 pages
Action Plan: Department of Education
No ratings yet
Action Plan: Department of Education
3 pages
Final Training Design
No ratings yet
Final Training Design
4 pages
ZEOFREE® 600 - Evonik
No ratings yet
ZEOFREE® 600 - Evonik
2 pages
Marine Spread Specification
No ratings yet
Marine Spread Specification
38 pages
Required Documents - World Education Services
No ratings yet
Required Documents - World Education Services
6 pages
Dig CHINA - US - CANADA NEXUS Incl NXIVM Clinton Epstein Feinstein Belzberg
No ratings yet
Dig CHINA - US - CANADA NEXUS Incl NXIVM Clinton Epstein Feinstein Belzberg
69 pages
Module 04 Investment N Securities Law-6
No ratings yet
Module 04 Investment N Securities Law-6
76 pages
Unit - 3 Individual Determinants of Buyer Behavior
No ratings yet
Unit - 3 Individual Determinants of Buyer Behavior
18 pages
Smart Factory Navigator: Lukas Budde Roman Hänggi Thomas Friedli Adrian Rüedy
100% (2)
Smart Factory Navigator: Lukas Budde Roman Hänggi Thomas Friedli Adrian Rüedy
297 pages

Classification With Decision Trees

Uploaded by

Classification With Decision Trees

Uploaded by

Classification

• For medium saving, entropy is

• Candidate splits for the Assets medium node

You might also like