ML L8 Decision Tree
ML L8 Decision Tree
19CSE305
L-T-P-C: 3-0-3-4
Lecture 8
Decision Trees
DECISION TREES
3
Definition
The decision tree model of learning
The goal of learner: Figure out what questions to
ask, in what order, and what to predict when you
have answered enough questions
What is Decision Tree?
6
Two Main Types of Decision Trees
● Classification Trees
○ Here the decision variable is Categorical/ discrete
○ Binary recursive partitioning – a process of splitting the data into
partitions, and then splitting it up further on each of the branches.
7
Two Main Types of Decision Trees
● Regression Trees
○ Decision trees where the target variable can take continuous values
are called regression trees
8
Applications
The decision tree model of learning
Do people like movies by an Actor “x”?
The decision tree model of learning
Do people like movies by an Actor “x”?
y Action No
y Fiction No
y Romance Yes
Why it’s Decision Tree?
x Action Yes
x Fiction Yes
x Romance No
x Action Yes
y Action No
y Fiction No
y Romance Yes
Why it’s Decision Tree?
x Fiction Yes
x Romance No
x Action Yes
y Action No
y Fiction No
y Romance Yes
Decision Trees
Example 2
If we are classifying bank loan application for a customer, the decision tree may
look like this
Decision Trees
What is a Decision tree?
A
3 Types of nodes:
F T
1. Root Node
2. Branch Node Branch Node
B B
3. Leaf Node
F T
Benefits T F
•It does not require any
domain knowledge. F
•It is easy to comprehend. T T F
•The learning and
classification steps of a
decision tree are simple
and fast. Leaf Node
Classification—A Two-Step Process [RECAP]
Note: If the test set is used to select models, it is called validation (test) set
Steps in Decision Tree Construction
Begin the tree with the root node, says S, which contains the
complete dataset
Divide the S into subsets that contain possible values for the
best attributes
23
How does Decision Tree Classification Work?
24
Examples
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ? Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Decision Tree-Classification
Task
al al us
r ic r ic u o
o o
teg teg n tin as
s
a a o c l
c c c
Tid Refund Marital Taxable Splitting Attributes
Status Income Cheat
No Married 80K ?
Refund
10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund
10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund
10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund
10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund
10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund
10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Decision Tree Classification
Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
Training Set
Apply Decision
Tree
Tid Attrib1 Attrib2 Attrib3 Class
Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
ID3 Decision
Tree model
ID3 (Iterative Dichotomiser 3)
Assume that the attributes are categorical
The attribute with a high information gain value is placed at the root.
Quinlan was a computer science researcher in data mining, and decision theory. Received doctorate in
computer science at the University of Washington in 1968.
• Based on Entropy
Flipping a coin
• The entropy H(X) is zero when the
probability is either 0 or 1.
• The Entropy is maximum when the
probability is 0.5
◦ Test attributes are selected on the basis of a heuristic or statistical measure (e.g.,
information gain)
◦ There are no remaining attributes for further partitioning – majority voting is employed
for classifying the leaf
43
◦ There are no samples left
ID3 Algorithm
1. compute the entropy for data-set
2. For every attribute/feature:
2.1 calculate entropy for all categorical values
2.2 take average information entropy for the current attribute
2.3 calculate gain for the current attribute
3. pick the highest gain attribute.
4. Repeat until we get the tree we desired.
Pros of ID3 Algorithm
Builds decision tree in minimum steps
Disadvantages:
Decision tree tends to overfit and may have high variance – don’t
perform well with testing data.
Ensemble learning
Ensemble methods, which combines several
decision trees to produce better predictive
performance than utilizing a single decision tree.
The main principle behind the ensemble model is
that a group of weak learners come together to form
a strong learner.
Machine learning techniques that use the combined output of
two or more models/weak learners and solve a particular
computational intelligence problem. E.g., a Random Forest
algorithm is an ensemble of various decision trees combined.
Simple Ensemble techniques
1. Max voting
2. Averaging
3. Weighted Averaging
Max Voting
The max voting method is generally used for classification problems.
In this technique, multiple models are used to make predictions for each
data point.
The predictions by each model are considered as a ‘vote’.
The predictions from the majority of the models are used as the final
prediction.
https://www.analyticsvidhya.com/blog/2021/12/a-detailed-guide-to-ense
mble-learning/