0% found this document useful (0 votes)
5 views

CHTKT - DataScience - Chapter03 - Machine Learning With Python - 02

Uploaded by

lyntm125
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

CHTKT - DataScience - Chapter03 - Machine Learning With Python - 02

Uploaded by

lyntm125
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

UNIVERSITY OF ECONOMICS HO CHI MINH CITY

SCHOOL OF ECONOMIC MATHEMATICS AND STATISTICS

INTRODUCTION TO DATA SCIENCE AND


APPLICATIONS

2023

Instructor: TRAN THI TUAN ANH


3. CLASSIFICATION (cont)- DECISION TREE

3. CLASSIFICATION (cont)- DECISION TREE

Instructor: TRAN THI TUAN ANH 2 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


a. What is decision tree?

A technique to classify observations into classes by sorting them down the


tree from the root to some leaf note
Some concepts of decision tree
Node/Decision Node: attribute/variable/feature
Branch/Sub-tree: A tree formed by splitting the tree.
Root node: From where the decision tree starts
Leaf node: Leaf nodes are final outputs
Splitting: The process of dividing the decision node/root node into
sub-nodes according to the given conditions.
Pruning: The process of removing the unwanted branches from the
tree.
Instructor: TRAN THI TUAN ANH 3 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


a. What is decision tree?

Decision tree

Source: Internet
Instructor: TRAN THI TUAN ANH 4 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


a. What is decision tree?

Classification trees Regression trees


Output is qualitative Output is quantitative
Use measures like Gini index,
Use variance reduction, mean
entropy, or classification error
squared error, or other similar
to find the best attribute that
metrics to find the best
can split the data
attribute that can split the data
Predict by major category of
the target variable in the leaf Predict by mean/median of the
node target variable in the leaf node

Instructor: TRAN THI TUAN ANH 5 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


a. What is decision tree?

Example Decision tree : A candidate who has a job offer and wants to
decide whether he should accept the offer or not.

Instructor: TRAN THI TUAN ANH 6 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


a. What is decision tree?

Example Decision tree :Low risk or High risk of Heart attack.

Instructor: TRAN THI TUAN ANH 7 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


How to build a decision tree?

1 Start from empty decision tree


2 Split on next best attribute
3 Recurse

Instructor: TRAN THI TUAN ANH 8 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


How to build a decision tree?

Type of decision tree algorithms to build decision tree:


ID3 (Iterative Dichotomiser 3)
C4.5 (successor of ID3)
CART (Classification and Regression Tree)
CHAID (Chi-square Automatic Interaction Detector)
...
One of the core algorithms for building decision trees is ID3

Instructor: TRAN THI TUAN ANH 9 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


How to build a decision tree?

ID3 (Iterative Dichotomiser 3):


One of the earliest and simplest decision tree algorithms.
Uses entropy and information gain measures to decide the splitting of
nodes.
Works well with categorical data but does not handle numerical data
May lead to overfitting and can create biased trees.

Instructor: TRAN THI TUAN ANH 10 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


How to build a decision tree?

C4.5 (Successor of ID3):


An improvement over ID3, developed by Ross Quinlan
Can be used for both classification and regression tasks.
Uses gain ratio instead of information gain to handle bias towards
multi-valued attributes.
Reduce overfitting and can handle missing data.

Instructor: TRAN THI TUAN ANH 11 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


How to build a decision tree?

CART (Classification and Regression Tree):


Can be used for both classification and regression tasks.
Uses Gini impurity as the criterion for classification trees and mean
squared error for regression trees.
Be able to handle large datasets

Instructor: TRAN THI TUAN ANH 12 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


c. How to measure the purity of leaf node?

The purity of a leaf node can be defined by:


Classification error
Gini impurity
Entropy and Information gain

Instructor: TRAN THI TUAN ANH 13 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


c. How to measure the purity of leaf node?

Classification error:
Em = 1 − max(pi )
where pi represents the proportion of instances of class i in the node.
A lower classification error suggests a more pure or homogeneous leaf
node. Example: if you have a leaf node with
Class A: 16 obs
Class B: 13 obs
Class C: 1 obs
What is the classification error of this node?

Instructor: TRAN THI TUAN ANH 14 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


c. How to measure the purity of leaf node?

Gini impurity
K
X K
X
Gini = pi (1 − pi ) = 1 − pi2
i=1 i=1
where
pi represents the proportion of instances of class i in the node.
1 − pi is the probability of selecting an element not from class i
A lower Gini impurity suggests a more pure or homogeneous leaf node.
Example: if you have a leaf node with
Class A: 16 obs
Class B: 13 obs
Class C: 1 obs
Instructor: TRAN THI TUAN ANH 15 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


c. How to measure the purity of leaf node?

Entropy
K
X
Entropy = − pi log2 (pi )
i=1
where
pi represents the proportion of instances of class i in the node.
A lower entropy value indicates a more pure or homogeneous node.
Example: if you have a leaf node with
Class A: 16 obs
Class B: 13 obs
Class C: 1 obs
What is the Entropy of this node?
Instructor: TRAN THI TUAN ANH 16 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


d. Building a decision tree

Entropy and Information gain - Some rules should be followed:


A branch with an entropy of 0 is a leaf node.
A branch with an entropy more than 0 needs further splitting.
In case it is not possible to achieve zero entropy in the leaf nodes, the
decision is made by the method of a simple majority.

Instructor: TRAN THI TUAN ANH 17 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


d. Building a decision tree

Entropy for decision tree: Example

Instructor: TRAN THI TUAN ANH 18 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


d. Building a decision tree

Information gain
The information gain is based on the decrease in entropy after a
dataset is split on an attribute.
Constructing a decision tree is all about finding attribute that returns
the highest information gain
Note: More uncertainty, more entropy!
When to stop?
when all records in current data subset have the same output
or all records have exactly the same set of input attributes
or set a minimum number of observations on each leaf
or set a maximum depth refers to the the length of the longest path
Instructor: TRAN THI TUAN ANH 19 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


d. Building a decision tree

More detailed steps to build decision tree:


1 Compute the entropy for data-set
2 For every attribute/feature:
Calculate entropy for all categorical values
Take average information entropy for the current attribute
Calculate gain for the current attribute
3 Pick the highest gain attribute
4 Repeat until we get the tree we desired.

Instructor: TRAN THI TUAN ANH 20 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


d. Building a decision tree

Example of building a decision tree:

Instructor: TRAN THI TUAN ANH 21 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree


d. Building a decision tree

Decision tree implementation using Python


Example 3.2

Instructor: TRAN THI TUAN ANH 22 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

Decision tree implementation using Python


Result visualization:

Instructor: TRAN THI TUAN ANH 23 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

Decision tree implementation using Python


Example 3.3: Decision tree with Iris data (Results)

Instructor: TRAN THI TUAN ANH 24 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

Decision tree implementation using Python


Example 3.3: Decision tree with Iris data (Tree)

Instructor: TRAN THI TUAN ANH 25 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

Decision tree implementation using Python


Example 3.4: Another Python code for decision tree with Iris data

Instructor: TRAN THI TUAN ANH 26 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

What is random forests?


Random forests is an ensemble learning algorithm.
They can build many small, weak decision trees in parallel, and then
combine the trees to form a single, strong learner by averaging or
taking the majority vote.
There is a direct relationship between the number of trees in the
forest and the results it can get: the larger the number of trees, the
more accurate the result
In Random Forest, the processes of finding the root node and splitting
the feature nodes will run randomly.

Instructor: TRAN THI TUAN ANH 27 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

What is random forests?

Instructor: TRAN THI TUAN ANH 28 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

Why Random Forest algorithm?


if there are enough trees in the forest, the classifier won’t overfit the
model.
the classifier of Random Forest can handle missing values.
There is a direct relationship between the number of trees in the
forest and the results it can get: the larger the number of trees, the
more accurate the result
the Random Forest classifier can be modeled for categorical values.

Instructor: TRAN THI TUAN ANH 29 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

How Random Forest algorithm works? includes two stages


First stage is random forest creation;
Second is to make a prediction from the random forest classifier
created in the first stage.
First stage:
1 Randomly select k features from total m features where k « m
2 Among the k features, calculate the node d using the best split point
3 Split the node into daughter nodes using the best split
4 Repeat the 1 to 3 steps untill number of nodes has been reached
5 Build forest by repeating steps 1 to 4 for n number times to create n
number of trees
Instructor: TRAN THI TUAN ANH 30 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

How Random Forest algorithm works? Second stage:


With the random forest classifier created, we will make the prediction.
1 Takes the test features and use the rules of each randomly created
decision tree to predict the outcome and stores the predicted outcome
(target)
2 Calculate the votes for each predicted target
3 Consider the high voted predicted target as the final prediction from
the random forest algorithm

Instructor: TRAN THI TUAN ANH 31 / 34


3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

Example 3.5: Python code for random forests

Instructor: TRAN THI TUAN ANH 32 / 34


3. CLASSIFICATION (cont)- DECISION TREE

Exercise for group discussion

Discuss with your group and submit your answers:


List some other extensions of decision tree (except Random forests)
List as much as possible the potential applications of Classification
algorithm in business or in real world.
Link to submit your group’s answer:
https://docs.google.com/forms/d/1OEPTearh8DaM8I4O8Mb8iUUWfmCpxy4
l3q5EiM

Instructor: TRAN THI TUAN ANH 33 / 34


3. CLASSIFICATION (cont)- DECISION TREE

THE END

THANK YOU FOR LISTENING

Instructor: TRAN THI TUAN ANH 34 / 34

You might also like