0% found this document useful (0 votes)
9 views54 pages

Lecture 5 Classification P2 Decision Tree

Uploaded by

Tran Bach
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views54 pages

Lecture 5 Classification P2 Decision Tree

Uploaded by

Tran Bach
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

UET

Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN


VNU-University of Engineering and Technology

INT3405 - Machine Learning


Lecture 5: Classification (P2)
Decision Tree

Duc-Trong Le & Viet-Cuong Ta

Hanoi, 09/2023
Recap: Key Issues in Machine Learning
● What are good hypothesis spaces? We choose
○ Which spaces have been useful in practical applications and why? To
● What algorithms can work with these spaces? Optimize
○ Are there general design principles for machine learning algorithms?
● How can we find the best hypothesis in an efficient way?
○ How to find the optimal solution efficiently (“optimization” question)
● How can we optimize accuracy on future data?
○ Known as the “overfitting” problem (i.e., “generalization” theory)
● How can we have confidence in the results?
○ How much training data is required to find accurate hypothesis? (“statistical” question)
● Are some learning problems computationally intractable? (“computational” question)
● How can we formulate application problems as machine learning problems? (“engineering”
question)
FIT-CS INT3405 - Machine Learning 2
Recap: Model Representation
Training Set How do we represent h ?

Learning Algorithm y

Size of h Estimated x
house price
x Hypothesis y
Linear regression with one variable.
“Univariate Linear Regression”

How to choose parameters ?


FIT-CS INT3405 - Machine Learning 3
Recap: Gradient Descent for Optimization

FIT-CS INT3405 - Machine Learning 4


Recap: Gradient Descent Example

(for fixed , this is a function of x) (function of the parameters )

How fast to converge to the Global Optimal?

FIT-CS INT3405 - Machine Learning 5


Normal Equation (3)
● Matrix-vector formulation

● Analytical solution
Take O(mn2+n3)

FIT-CS INT3405 - Machine Learning 6


Outline
● Decision Tree Examples
● Decision Tree Algorithms
● Methods for Expressing Test Conditions
● Measures of Node Impurity
○ Gini index
○ Entropy
○ Gain Ratio
○ Classification Error

FIT-CS INT3405 - Machine Learning 7


Supervised Learning Workflow

FIT-CS INT3405 - Machine Learning 8


Example of a Decision Tree

Source: https://regenerativetoday.com/simple-explanation-on-how-decision-tree-algorithm-makes-decisions/

FIT-CS INT3405 - Machine Learning 9


Example of a Decision Tree

FIT-CS INT3405 - Machine Learning 10


Example of Model Prediction (1)

FIT-CS INT3405 - Machine Learning 11


Example of Model Prediction (2)

FIT-CS INT3405 - Machine Learning 12


Example of Model Prediction (3)

FIT-CS INT3405 - Machine Learning 13


Example of Model Prediction (4)

FIT-CS INT3405 - Machine Learning 14


Example of Model Prediction (5)

FIT-CS INT3405 - Machine Learning 15


Example of Model Prediction (6)

FIT-CS INT3405 - Machine Learning 16


Decision Tree - Another Solution

There could be more than one tree that


fits the same data!

FIT-CS INT3405 - Machine Learning 17


Decision Tree - Classification Task

FIT-CS INT3405 - Machine Learning 18


Decision Tree Induction
● Various Algorithms
○ Hunt’s Algorithm (one of the earliest)
○ CART
○ ID3, C4.5
○ SLIQ, SPRINT

FIT-CS INT3405 - Machine Learning 19


General Structure of Hunt’s Algorithm
● Let Dt be the set of training records that reach a
node t
● General Procedure:
○ If Dt contains records that belong the same class
yt, then t is a leaf node labeled as yt
○ If Dt contains records that belong to more than
one class, use an attribute test to split the data
into smaller subsets.
○ Recursively apply the procedure to each subset.

FIT-CS INT3405 - Machine Learning 20


Hunt’s Algorithm

FIT-CS INT3405 - Machine Learning 21


Hunt’s Algorithm

FIT-CS INT3405 - Machine Learning 22


Hunt’s Algorithm

FIT-CS INT3405 - Machine Learning 23


Hunt’s Algorithm

FIT-CS INT3405 - Machine Learning 24


Design Issues of Decision Tree Induction
● How should training records be split?
○ Method for expressing test condition
■ depending on attribute types
○ Measure for evaluating the goodness of a test condition
● How should the splitting procedure stop?
○ Stop splitting if all the records belong to the same class or
have identical attribute values
○ Early termination

FIT-CS INT3405 - Machine Learning 25


Methods for Expressing Test Conditions
● Depends on attribute types
○ Binary
○ Nominal
○ Ordinal
○ Continuous

FIT-CS INT3405 - Machine Learning 26


Test Conditions for Nominal Attributes
● Multi-way split:
○ Use as many partitions as distinct values.

● Binary split:
○ Divides values into two subsets

FIT-CS INT3405 - Machine Learning 27


Test Conditions for Nominal Attributes
● Multi-way split:
○ Use as many partitions as distinct values.

● Binary split:
○ Divides values into two subsets

FIT-CS INT3405 - Machine Learning 28


Test Conditions for Ordinal Attributes
● Multi-way split:
○ Use as many partitions as distinct values.
● Binary split:
○ Divides values into two subsets
○ Preserve order property among attribute values

This grouping
violates order
property

FIT-CS INT3405 - Machine Learning 29


Test Conditions for Continuous Attributes

FIT-CS INT3405 - Machine Learning 30


Splitting based on Continuous Attributes
● Different ways of handling
○ Discretization to form an ordinal categorical attribute. Ranges
can be found by equal interval bucketing, equal frequency
bucketing (percentiles), or clustering.
■ Static – discretize once at the beginning
■ Dynamic – repeat at each node
○ Binary Decision: (A < v) or (A ≥ v)
■ consider all possible splits and finds the best cut
■ can be more compute intensive
FIT-CS INT3405 - Machine Learning 31
How to determine the Best Split

Before Splitting: 10 records of class 0,


10 records of class 1

Which test condition is the best?


FIT-CS INT3405 - Machine Learning 32
How to determine the Best Split
● Greedy approach:
– Nodes with purer class distribution are preferred

● Need a measure of node impurity:

High degree of impurity Low degree of impurity

FIT-CS INT3405 - Machine Learning 33


Measures of Node Impurity
● Gini Index

● Entropy

● Misclassification error

FIT-CS INT3405 - Machine Learning 34


Find the Best Split
● Compute impurity measure (P) before splitting
● Compute impurity measure (M) after splitting
○ Compute impurity measure of each child node
○ M is the weighted impurity of child nodes
● Choose the attribute test condition that produces the highest gain
Gain = P - M
● or equivalently, lowest impurity measure after splitting (M)

FIT-CS INT3405 - Machine Learning 35


Find the Best Split

FIT-CS INT3405 - Machine Learning 36


Measure of Impurity: Gini (index)
● Gini Index for a given node t :

● For 2-class problem (p, 1 – p):

◆ GINI = 1 – p2 – (1 – p)2 = 2p (1-p)

FIT-CS INT3405 - Machine Learning 37


Computing Gini Index of a Single Node

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1


Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0

P(C1) = 1/6 P(C2) = 5/6


Gini = 1 – (1/6)2 – (5/6)2 = 0.278

P(C1) = 2/6 P(C2) = 4/6


Gini = 1 – (2/6)2 – (4/6)2 = 0.444
FIT-CS INT3405 - Machine Learning 38
Computing Gini Index for a Collection of Nodes

FIT-CS INT3405 - Machine Learning 39


Binary Attributes: Computing GINI Index
● Splits into two partitions (child nodes)
● Effect of Weighing partitions:
– Larger and purer partitions are sought

FIT-CS INT3405 - Machine Learning 40


Categorical Attributes: Computing GINI Index
● For each distinct value, gather counts for each class in the dataset
● Use the count matrix to make decisions

Which of these is the best?

FIT-CS INT3405 - Machine Learning 41


Continuous Attributes: Computing Gini Index
● Use Binary Decisions based on one value
● Several Choices for the splitting value
○ Number of possible splitting values
= Number of distinct values
● Each splitting value has a count matrix associated with it
○ Class counts in each of the partitions, A ≤ v and A > v
● Simple method to choose best v
○ For each v, scan the database to gather count matrix
and compute its Gini index
○ Computationally Inefficient! Repetition of work.

FIT-CS INT3405 - Machine Learning 42


Measure of Impurity: Entropy
● Entropy for a given node t :

P(C1) = 0/6 = 0 P(C2) = 6/6 = 1


Entropy = – 0 log 0 – 1 log 1 = – 0 – 0 = 0

P(C1) = 1/6 P(C2) = 5/6


Entropy = – (1/6) log2 (1/6) – (5/6) log2 (1/6) = 0.65

P(C1) = 2/6 P(C2) = 4/6


Entropy = – (2/6) log2 (2/6) – (4/6) log2 (4/6) = 0.92

FIT-CS INT3405 - Machine Learning 43


Computing Information Gain After Splitting

FIT-CS INT3405 - Machine Learning 44


Problem with large number of partitions
● Node impurity measures tend to prefer splits that result in large number
of partitions, each being small but pure

– Customer ID has highest information gain because entropy for all


the children is zero

FIT-CS INT3405 - Machine Learning 45


Gain Ratio

FIT-CS INT3405 - Machine Learning 46


Gain Ratio

SplitINFO = 1.52 SplitINFO = 0.72 SplitINFO = 0.97

FIT-CS INT3405 - Machine Learning 47


Measure of Impurity: Classification Error

FIT-CS INT3405 - Machine Learning 48


Computing Error of a Single Node

FIT-CS INT3405 - Machine Learning 49


Computing Error of a Single Node

FIT-CS INT3405 - Machine Learning 50


Comparison among Impurity Measures
For a 2-class problem:

FIT-CS INT3405 - Machine Learning 51


Decision Tree Classification
● Advantages:
○ Relatively inexpensive to construct
○ Extremely fast at classifying unknown records
○ Easy to interpret for small-sized trees
○ Robust to noise
○ Can easily handle redundant attributes, irrelevant attributes
● Disadvantages: .
○ Due to the greedy nature of splitting criterion, interacting attributes (that
can distinguish between classes together but not individually) may be
passed over in favor of other attributed that are less discriminating.
○ Each decision boundary involves only a single attribute
FIT-CS INT3405 - Machine Learning 52
Summary
● Decision Tree Examples
● Decision Tree Algorithms
● Methods for Expressing Test Conditions
● Measures of Node Impurity
○ Gini index
○ Entropy
○ Gain Ratio
○ Classification Error

FIT-CS INT3405 - Machine Learning 53


UET
Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN


VNU-University of Engineering and Technology

Thank you
Email me
[email protected]

You might also like