0% found this document useful (0 votes)

10 views42 pages

Session 5b Classification by Decision Tree Induction

This document provides an overview of decision tree classification, detailing the algorithm for learning decision trees, attribute selection measures such as information gain, gain ratio, and Gini index, as well as tree pruning techniques. It discusses the structure of decision trees, their applications, and the historical development of algorithms like ID3, C4.5, and CART. Additionally, it highlights the importance of preventing overfitting and introduces random forests as an ensemble learning method.

Uploaded by

owekesa361

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views42 pages

Session 5b Classification by Decision Tree Induction

Uploaded by

owekesa361

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

DECISION TREE CLASSIFICATION

1
INTRODUCTION

Welcome back, in this session, we shall learn the basic

algorithm for learning decision trees, Attribute Selection
Measures: Information gain, Gain ratio and Gini index, and
Tree Pruning.

2
LEARNING OUTCOMES

By the end of this session you should be able to;

• describe a basic algorithm for learning decision trees.

• apply attribute selection measures are used to select the
attribute that best partitions the tuples into distinct
classes.
• apply pruning algorithms to improve accuracy by
removing tree branches reflecting noise (or outliers) in
the training data.

3
DECISION TREES

• A decision tree is a tree-structured classifier where:

– each internal node (non leaf node) denotes a test on
an attribute
– each branch represents an outcome of the test
– each leaf node (or terminal node) holds a class label
– The top most node in a tree is the root node.

• It is called a decision tree because, similar to a tree, it

starts with the root node, which expands on further
branches and constructs a tree-like structure.

4
DECISION TREES

Figure 1. The general form of decision-tree.

5
DECISION TREES

• All the internal nodes represent the features of a

dataset (features are attributes)

• A decision tree simply asks a question, and based on the

answer (Yes/No), it further split the tree into subtrees.

6
DECISION TREES

Figure 2. Example of decision-tree

7
DECISION TREES

• Decision Tree is a Supervised learning technique.

• It can be used for both classification and Regression

problems, but mostly it is preferred for solving
Classification problems.

• In order to build a tree, we use the CART

algorithm, which stands for Classification and Regression
Tree algorithm

8
HOW DOES THE DECISION TREE
ALGORITHM WORK?
• Given a tuple X (where X is a set of attributes
associated with the unknown class label), the attribute
values of the X are tested against the decision tree.
1. Decision tree algorithm compares the values of root
attribute with the record (real dataset) attribute and,
based on the comparison, follows the branch and
jumps to the next node.
2. For the next node, the algorithm again compares the
attribute value with the other sub-nodes and move
further.
3. It continues the process until it reaches the leaf
node of the tree.
9
HOW DOES THE DECISION TREE
ALGORITHM WORK?
Example: Suppose there is a candidate who has a job
offer and wants to decide whether he should accept
the offer or Not. So, to solve this problem, the decision
tree starts with the root node (Salary attribute by
ASM). The root node splits further into the next
decision node (distance from the office) and one leaf
node based on the corresponding labels. The next
decision node further gets split into one decision node
(Cab facility) and one leaf node. Finally, the decision
node splits into two leaf nodes (Accepted offers and
Declined offer).
10
DECISION TREES

Figure 3. Example of decision-tree process

11
DECISION TREES
History

• Late 1970s - early 1980s, J. Ross Quinlan, a

researcher in machine learning, developed a decision
tree algorithm known as ID3 (Iterative
Dichotomiser).
• ID3 is named such because the algorithm iteratively
(repeatedly) dichotomizes(divides) features into two
or more groups at each step.
• ID3 uses a top-down greedy approach to build a
decision tree: we start building the tree from the top
and the greedy approach means that at each
iteration we select the best feature at the
present moment to create a node. 12
DECISION TREES
History

• Quinlan later presented C4.5 (a successor of ID3),

which became a benchmark to which newer
supervised learning algorithms are often compared.
• C4.5 is an extension of Quinlan's earlier ID3
algorithm.
• The decision trees generated by C4.5 can be used for
classification, and for this reason, C4.5 is often
referred to as a statistical classifier.

13
DECISION TREES
History

• In 1984, statisticians L. Breiman, J. Friedman, R.

Olshen, and C. Stone published the book
Classification and Regression Trees (CART), which
described the generation of binary decision trees.
• Classification and Regression Trees or CART refers
to Decision Tree algorithms that can be used for
classification or regression predictive modeling
problems.
• Classically, this algorithm is referred to as “decision
trees”, but on some platforms like R they
are referred to by the more modern term CART.
14
DECISION TREES
History

• ID3, C4.5, and CART adopt a greedy (i.e.,

nonbacktracking) approach in which decision trees
are constructed in a top-down recursive divide-and-
conquer manner.
• Most algorithms for decision tree induction also
follow a top-down approach, which starts with a
training set of tuples and their associated class
labels.
• The training set is recursively partitioned into smaller
subsets as the tree is being built.
15
THE DECISION TREE ALGORITHM
How to generate a decision tree
• Algorithm: Generate a decision tree from the
training tuples of data partition, D.
Where:
• D - is a set of training tuples and their associated
class labels;
• attribute list – is the set of candidate attributes;

• Attribute selection method – is a procedure to

determine the splitting criterion that “best”
partitions the data tuples into individual classes.
Note: This criterion consists of a splitting attribute
and, possibly, either a split-point or splitting
subset. 16
THE DECISION TREE ALGORITHM

17
ATTRIBUTE SELECTION MEASURES

• While implementing a Decision tree, the main issue

arises that how to select the best attribute for the root
node and for sub-nodes.

• To solve such problems there is a technique which is

called as Attribute selection measure or ASM.

• An attribute selection measure is a heuristic for

selecting the splitting criterion that “best” separates a
given data partition, D, of class-labeled training tuples
into individual classes.

18
ATTRIBUTE SELECTION MEASURES

• If we were to split D into smaller partitions according to

the outcomes of the splitting criterion, ideally each
partition would be pure (i.e., all the tuples that fall into a
given partition would belong to the same class).

• Conceptually, the “best” splitting criterion is the one that

most closely results in such a scenario.

• Attribute selection measures are also known as

splitting rules because they determine how the tuples
at a given node are to be split.

19
ATTRIBUTE SELECTION MEASURES

• The attribute selection measure provides a ranking for

each attribute describing the given training tuples.

• The attribute having the best score for the measure is

chosen as the splitting attribute for the given tuples.

• If the splitting attribute is continuous-valued or if we are

restricted to binary trees, then, respectively, either a
split point or a splitting subset must also be
determined as part of the splitting criterion.

20
ATTRIBUTE SELECTION MEASURES

• There are three popular attribute selection measures:

– information gain
– gain ratio
– Gini index

21
ATTRIBUTE SELECTION MEASURES

Information Gain
• Information gain is the measurement of changes in
entropy after the segmentation of a dataset based on an
attribute.
• It calculates how much information a feature provides us
about a class.
• According to the value of information gain, we split the
node and build the decision tree.
• A decision tree algorithm always tries to maximize the
value of information gain, and a node/attribute having
the highest information gain is split first

22
ATTRIBUTE SELECTION MEASURES

Information Gain
• ID3 uses information gain as its attribute selection
measure.
• This measure is based on pioneering work by Claude
Shannon on information theory, which studied the
value or “information content” of messages.
• Let node N represent or hold the tuples of partition
D:-The attribute with the highest information gain is
chosen as the splitting attribute for node N.

23
ATTRIBUTE SELECTION MEASURES

Information Gain
• The expected information needed to classify a tuple in D
is given by:

• where
• pi is the nonzero probability that an arbitrary tuple in D
belongs to class Ci
• Info(D) is just the average amount of information
needed to identify the class label of a tuple in D.

24
ATTRIBUTE SELECTION MEASURES

Information Gain
• Information gain is a decrease in entropy.
• It computes the difference between entropy before
split and average entropy after split of the dataset
based on given attribute values

• In the following example, we will calculate the

entropy before and after split then compute the
information gain

25
ATTRIBUTE SELECTION MEASURES
Information Gain
Example: Entropy for 1 attribute is calculated as follows

Where S → Current state, and Pi → Probability of an

event i of state S or Percentage of class i in a node of
state S. 26
ATTRIBUTE SELECTION MEASURES
Information Gain
Entropy for multiple attributes is calculated as:

where T→ Current state and X → Selected attribute

27
ATTRIBUTE SELECTION MEASURES
Information Gain
Example - Entropy for multiple attributes is calculated
as:

28
ATTRIBUTE SELECTION MEASURES
Information Gain

Therefore,

29
ATTRIBUTE SELECTION MEASURES

Gain Ratio
• Information gain is biased towards choosing
attributes with a large number of values as root
nodes.
• It means it prefers the attribute with a large number
of distinct values.
• C4.5, an improvement of ID3, uses Gain ratio which
is a modification of Information gain that reduces its
bias and is usually the best option.

30
ATTRIBUTE SELECTION MEASURES
Gain Ratio
• Gain ratio overcomes the problem with information
gain by taking into account the number of branches
that would result before making the split.
• It corrects information gain by taking the intrinsic
information of a split into account.

– Where “before” is the dataset before the split, K is the

number of subsets generated by the split, and (j, after) is
subset j after the split
31
ATTRIBUTE SELECTION MEASURES

Gini Index
• Gini index is a cost function used to evaluate splits in
the dataset.
• It is calculated by subtracting the sum of the squared
probabilities of each class from one.
• It favors larger partitions and easy to implement
whereas information gain favors smaller partitions
with distinct values.

32
ATTRIBUTE SELECTION MEASURES

Other Attribute Selection Measures include:

• Reduction in Variance
• Chi-Square

33
Overfitting Problem in Decision Trees

• The common problem with Decision trees, especially

having a table full of columns, they fit a lot.
• Sometimes it looks like the tree memorized the
training data set.
• If there is no limit set on a decision tree, it will give
you 100% accuracy on the training data set because
in the worse case it will end up making 1 leaf for
each observation.

34
Overfitting Problem in Decision Trees
• Overfitting problem affects the accuracy when
predicting samples that are not part of the training
set.
• As a problem usually has a large set of features, it
results in large number of split, which in turn gives a
huge tree.
• Such trees are complex and can lead to overfitting.
But when do we stop splitting/growing the tree?

• We will look at two ways to remove overfitting:

– Pruning Decision Trees.

– Random Forest 35
Pruning Decision Trees

• Pruning refers to the removal of those branches in

our decision tree which we feel do not contribute
significantly to our decision process.
• The concept of Pruning enables us to
avoid Overfitting of the regression or classification
model so that for a small sample of data, the errors
in measurement are not included while generating
the model.
• This can be done using any of the measures
discussed in attribute reduction methods such as
information gain where the one with the least
information gain is the least significant. 36
Random Forest
• Random Forest is an example of ensemble learning,
in which we combine multiple machine learning
algorithms to obtain better predictive performance.
• A technique known as bagging is used to create an
ensemble of trees where multiple training sets are
generated with replacement.
• In the bagging technique, a data set is divided
into N samples using randomized sampling.
• Then, using a single learning algorithm a model is
built on all samples.
• Later, the resultant predictions are combined using
voting or averaging in parallel.
37
DECISION TREES
Application Areas

• Decision tree algorithms have been used for

classification in many application areas such as
medicine, manufacturing and production, financial
analysis, astronomy, and molecular biology.

• Decision trees are the basis of several commercial

rule induction systems.

38
DECISION TREES
Why are decision tree classifiers so popular?
• The construction of decision tree classifiers does not
require any domain knowledge or parameter setting,
and therefore is appropriate for exploratory
knowledge discovery.
• Decision trees can handle multidimensional data.
• Their representation of acquired knowledge in tree
form is intuitive and generally easy to assimilate by
humans.
• The learning and classification steps of decision tree
induction are simple and fast.
• Decision tree classifiers have good accuracy.
39
SUMMARY

You have come to the end of this session on decision tree.

In this session, you learnt the basic algorithm for learning
decision trees, Attribute Selection Measures: Information
gain, Gain ratio and Gini index, and tree pruning In our
next session, you will learn about Bayesian Classification

40
THANK YOU

41
REFERENCES

1. Han, J., Kamber, M. and Pei, J., 2011. Data Mining:

Concepts and Techniques Third Edition [M]. The
Morgan Kaufmann Series in Data Management
Systems

Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
MCQ Time Series With Correct Answers
100% (10)
MCQ Time Series With Correct Answers
5 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
Linear Regression
100% (2)
Linear Regression
35 pages
Time Series Econometrics
100% (1)
Time Series Econometrics
223 pages
Data Mining
No ratings yet
Data Mining
68 pages
Mod 3 Part1 - Merged
No ratings yet
Mod 3 Part1 - Merged
101 pages
Unit-3 (MLT)
No ratings yet
Unit-3 (MLT)
46 pages
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Applied Longitudinal Analysis. ISBN 0470380276, 978-0470380277
100% (26)
Applied Longitudinal Analysis. ISBN 0470380276, 978-0470380277
23 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
23 pages
Classification
No ratings yet
Classification
75 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Chapter 6 The Statistical Tools
No ratings yet
Chapter 6 The Statistical Tools
38 pages
Decision Trees
No ratings yet
Decision Trees
61 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Trees
No ratings yet
Trees
78 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Unit-3 ML
No ratings yet
Unit-3 ML
47 pages
Decision Trees Lectures
No ratings yet
Decision Trees Lectures
55 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Multiple Regression
No ratings yet
Multiple Regression
56 pages
Lecture11 Ch8 ClassBasic Part1
No ratings yet
Lecture11 Ch8 ClassBasic Part1
38 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Class Basic
No ratings yet
Class Basic
75 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
MLT 3 UNIT-Part-1
No ratings yet
MLT 3 UNIT-Part-1
28 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
UNIT 2 - Groups (Decision Tree)
No ratings yet
UNIT 2 - Groups (Decision Tree)
20 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Python Decision Tree Classification
No ratings yet
Python Decision Tree Classification
14 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
Tree
No ratings yet
Tree
7 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
10.1 Decision Tree
No ratings yet
10.1 Decision Tree
17 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Decision Trees and How To Build and Optimize Decision Tree Classifier
No ratings yet
Decision Trees and How To Build and Optimize Decision Tree Classifier
16 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Naïve Bayes Classifier (Week 8)
No ratings yet
Naïve Bayes Classifier (Week 8)
18 pages
With Answers
100% (1)
With Answers
24 pages
Unit-16 TIME SERIES MODELS
No ratings yet
Unit-16 TIME SERIES MODELS
19 pages
Chap15 Demand Management and Forecasting
No ratings yet
Chap15 Demand Management and Forecasting
51 pages
Quartile & Deviation
No ratings yet
Quartile & Deviation
31 pages
Session 3 Types of Machine Learning
No ratings yet
Session 3 Types of Machine Learning
22 pages
Topic1.4-Functions of Random Variables
No ratings yet
Topic1.4-Functions of Random Variables
41 pages
Analyse Econometrique Avec Stata 12 2
No ratings yet
Analyse Econometrique Avec Stata 12 2
414 pages
Sokal Rohlf 2012 Contents
No ratings yet
Sokal Rohlf 2012 Contents
12 pages
Topic 3 Introduction To ARENA
No ratings yet
Topic 3 Introduction To ARENA
96 pages
Ngailo Edward
No ratings yet
Ngailo Edward
83 pages
The Spearman Rho Rank Correlation Coefficient
No ratings yet
The Spearman Rho Rank Correlation Coefficient
22 pages
Short Code Application Form
No ratings yet
Short Code Application Form
3 pages
Lesson 1 Web App Web Services
No ratings yet
Lesson 1 Web App Web Services
35 pages
Spss
No ratings yet
Spss
23 pages
Introduction To Model Validation: Kasey Jones
No ratings yet
Introduction To Model Validation: Kasey Jones
23 pages
Data Modification and Predictive Analytics - MCQ - 1 - 2
No ratings yet
Data Modification and Predictive Analytics - MCQ - 1 - 2
24 pages
Lesson 6 PHP MYSQL CRUD
No ratings yet
Lesson 6 PHP MYSQL CRUD
13 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
Lesson 8 Intro To Laravel
No ratings yet
Lesson 8 Intro To Laravel
26 pages
Higher Education Loans Board
No ratings yet
Higher Education Loans Board
4 pages
Week 6 - Data Cleaning
No ratings yet
Week 6 - Data Cleaning
8 pages
AMA 4417FUNCTIONAL ANALYSIS Course Outline
No ratings yet
AMA 4417FUNCTIONAL ANALYSIS Course Outline
3 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
3 pages
Copy of PLP Standard Pitch Deck Template
No ratings yet
Copy of PLP Standard Pitch Deck Template
16 pages
HP - Linear & Exponential Regression
No ratings yet
HP - Linear & Exponential Regression
2 pages
Research Paper 4
No ratings yet
Research Paper 4
12 pages
Research Paper 2
No ratings yet
Research Paper 2
12 pages
Research Paper 5
No ratings yet
Research Paper 5
11 pages
SB5 - Poisson Hypothesis Test
No ratings yet
SB5 - Poisson Hypothesis Test
15 pages
Syllabus Surv 742, Survmeth 618 Inference From Complex Surveys Winter 2013
No ratings yet
Syllabus Surv 742, Survmeth 618 Inference From Complex Surveys Winter 2013
8 pages
Creative Commons Attribution-Noncommercial-Sharealike License
No ratings yet
Creative Commons Attribution-Noncommercial-Sharealike License
18 pages
AMA 4415 ALGEBRAIC GEOMETRY Course Outline
No ratings yet
AMA 4415 ALGEBRAIC GEOMETRY Course Outline
2 pages
Samuel's Resume
No ratings yet
Samuel's Resume
1 page
Highly Variable Drugs (HVDS)
No ratings yet
Highly Variable Drugs (HVDS)
4 pages
Edexcelassh1axw Level2
No ratings yet
Edexcelassh1axw Level2
2 pages
Histogram of Delivery Data
No ratings yet
Histogram of Delivery Data
4 pages
JNTUH USED 07-11-2020 AM: Ax For X FX Elsewhere
No ratings yet
JNTUH USED 07-11-2020 AM: Ax For X FX Elsewhere
2 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet