0% found this document useful (0 votes)

5 views

CHTKT - DataScience - Chapter03 - Machine Learning With Python - 02

Uploaded by

lyntm125

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

CHTKT - DataScience - Chapter03 - Machine Learning With Python - 02

Uploaded by

lyntm125

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

UNIVERSITY OF ECONOMICS HO CHI MINH CITY

SCHOOL OF ECONOMIC MATHEMATICS AND STATISTICS

INTRODUCTION TO DATA SCIENCE AND

APPLICATIONS

2023

Instructor: TRAN THI TUAN ANH

3. CLASSIFICATION (cont)- DECISION TREE

Instructor: TRAN THI TUAN ANH 2 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

a. What is decision tree?

A technique to classify observations into classes by sorting them down the

tree from the root to some leaf note
Some concepts of decision tree
Node/Decision Node: attribute/variable/feature
Branch/Sub-tree: A tree formed by splitting the tree.
Root node: From where the decision tree starts
Leaf node: Leaf nodes are final outputs
Splitting: The process of dividing the decision node/root node into
sub-nodes according to the given conditions.
Pruning: The process of removing the unwanted branches from the
tree.
Instructor: TRAN THI TUAN ANH 3 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

a. What is decision tree?

Decision tree

Source: Internet
Instructor: TRAN THI TUAN ANH 4 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

a. What is decision tree?

Classification trees Regression trees

Output is qualitative Output is quantitative
Use measures like Gini index,
Use variance reduction, mean
entropy, or classification error
squared error, or other similar
to find the best attribute that
metrics to find the best
can split the data
attribute that can split the data
Predict by major category of
the target variable in the leaf Predict by mean/median of the
node target variable in the leaf node

Instructor: TRAN THI TUAN ANH 5 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

a. What is decision tree?

Example Decision tree : A candidate who has a job offer and wants to
decide whether he should accept the offer or not.

Instructor: TRAN THI TUAN ANH 6 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

a. What is decision tree?

Example Decision tree :Low risk or High risk of Heart attack.

Instructor: TRAN THI TUAN ANH 7 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

How to build a decision tree?

1 Start from empty decision tree

2 Split on next best attribute
3 Recurse

Instructor: TRAN THI TUAN ANH 8 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

How to build a decision tree?

Type of decision tree algorithms to build decision tree:

ID3 (Iterative Dichotomiser 3)
C4.5 (successor of ID3)
CART (Classification and Regression Tree)
CHAID (Chi-square Automatic Interaction Detector)
...
One of the core algorithms for building decision trees is ID3

Instructor: TRAN THI TUAN ANH 9 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

How to build a decision tree?

ID3 (Iterative Dichotomiser 3):

One of the earliest and simplest decision tree algorithms.
Uses entropy and information gain measures to decide the splitting of
nodes.
Works well with categorical data but does not handle numerical data
May lead to overfitting and can create biased trees.

Instructor: TRAN THI TUAN ANH 10 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

How to build a decision tree?

C4.5 (Successor of ID3):

An improvement over ID3, developed by Ross Quinlan
Can be used for both classification and regression tasks.
Uses gain ratio instead of information gain to handle bias towards
multi-valued attributes.
Reduce overfitting and can handle missing data.

Instructor: TRAN THI TUAN ANH 11 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

How to build a decision tree?

CART (Classification and Regression Tree):

Can be used for both classification and regression tasks.
Uses Gini impurity as the criterion for classification trees and mean
squared error for regression trees.
Be able to handle large datasets

Instructor: TRAN THI TUAN ANH 12 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

c. How to measure the purity of leaf node?

The purity of a leaf node can be defined by:

Classification error
Gini impurity
Entropy and Information gain

Instructor: TRAN THI TUAN ANH 13 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

c. How to measure the purity of leaf node?

Classification error:
Em = 1 − max(pi )
where pi represents the proportion of instances of class i in the node.
A lower classification error suggests a more pure or homogeneous leaf
node. Example: if you have a leaf node with
Class A: 16 obs
Class B: 13 obs
Class C: 1 obs
What is the classification error of this node?

Instructor: TRAN THI TUAN ANH 14 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

c. How to measure the purity of leaf node?

Gini impurity
K
X K
X
Gini = pi (1 − pi ) = 1 − pi2
i=1 i=1
where
pi represents the proportion of instances of class i in the node.
1 − pi is the probability of selecting an element not from class i
A lower Gini impurity suggests a more pure or homogeneous leaf node.
Example: if you have a leaf node with
Class A: 16 obs
Class B: 13 obs
Class C: 1 obs
Instructor: TRAN THI TUAN ANH 15 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

c. How to measure the purity of leaf node?

Entropy
K
X
Entropy = − pi log2 (pi )
i=1
where
pi represents the proportion of instances of class i in the node.
A lower entropy value indicates a more pure or homogeneous node.
Example: if you have a leaf node with
Class A: 16 obs
Class B: 13 obs
Class C: 1 obs
What is the Entropy of this node?
Instructor: TRAN THI TUAN ANH 16 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

d. Building a decision tree

Entropy and Information gain - Some rules should be followed:

A branch with an entropy of 0 is a leaf node.
A branch with an entropy more than 0 needs further splitting.
In case it is not possible to achieve zero entropy in the leaf nodes, the
decision is made by the method of a simple majority.

Instructor: TRAN THI TUAN ANH 17 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

d. Building a decision tree

Entropy for decision tree: Example

Instructor: TRAN THI TUAN ANH 18 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

d. Building a decision tree

Information gain
The information gain is based on the decrease in entropy after a
dataset is split on an attribute.
Constructing a decision tree is all about finding attribute that returns
the highest information gain
Note: More uncertainty, more entropy!
When to stop?
when all records in current data subset have the same output
or all records have exactly the same set of input attributes
or set a minimum number of observations on each leaf
or set a maximum depth refers to the the length of the longest path
Instructor: TRAN THI TUAN ANH 19 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

d. Building a decision tree

More detailed steps to build decision tree:

1 Compute the entropy for data-set
2 For every attribute/feature:
Calculate entropy for all categorical values
Take average information entropy for the current attribute
Calculate gain for the current attribute
3 Pick the highest gain attribute
4 Repeat until we get the tree we desired.

Instructor: TRAN THI TUAN ANH 20 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

d. Building a decision tree

Example of building a decision tree:

Instructor: TRAN THI TUAN ANH 21 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

d. Building a decision tree

Decision tree implementation using Python

Example 3.2

Instructor: TRAN THI TUAN ANH 22 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

Decision tree implementation using Python

Result visualization:

Instructor: TRAN THI TUAN ANH 23 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

Decision tree implementation using Python

Example 3.3: Decision tree with Iris data (Results)

Instructor: TRAN THI TUAN ANH 24 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

Decision tree implementation using Python

Example 3.3: Decision tree with Iris data (Tree)

Instructor: TRAN THI TUAN ANH 25 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.2 Decision tree

Decision tree implementation using Python

Example 3.4: Another Python code for decision tree with Iris data

Instructor: TRAN THI TUAN ANH 26 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

What is random forests?

Random forests is an ensemble learning algorithm.
They can build many small, weak decision trees in parallel, and then
combine the trees to form a single, strong learner by averaging or
taking the majority vote.
There is a direct relationship between the number of trees in the
forest and the results it can get: the larger the number of trees, the
more accurate the result
In Random Forest, the processes of finding the root node and splitting
the feature nodes will run randomly.

Instructor: TRAN THI TUAN ANH 27 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

What is random forests?

Instructor: TRAN THI TUAN ANH 28 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

Why Random Forest algorithm?

if there are enough trees in the forest, the classifier won’t overfit the
model.
the classifier of Random Forest can handle missing values.
There is a direct relationship between the number of trees in the
forest and the results it can get: the larger the number of trees, the
more accurate the result
the Random Forest classifier can be modeled for categorical values.

Instructor: TRAN THI TUAN ANH 29 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

How Random Forest algorithm works? includes two stages

First stage is random forest creation;
Second is to make a prediction from the random forest classifier
created in the first stage.
First stage:
1 Randomly select k features from total m features where k « m
2 Among the k features, calculate the node d using the best split point
3 Split the node into daughter nodes using the best split
4 Repeat the 1 to 3 steps untill number of nodes has been reached
5 Build forest by repeating steps 1 to 4 for n number times to create n
number of trees
Instructor: TRAN THI TUAN ANH 30 / 34
3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

How Random Forest algorithm works? Second stage:

With the random forest classifier created, we will make the prediction.
1 Takes the test features and use the rules of each randomly created
decision tree to predict the outcome and stores the predicted outcome
(target)
2 Calculate the votes for each predicted target
3 Consider the high voted predicted target as the final prediction from
the random forest algorithm

Instructor: TRAN THI TUAN ANH 31 / 34

3. CLASSIFICATION (cont)- DECISION TREE

3.3 Random forests

Example 3.5: Python code for random forests

Instructor: TRAN THI TUAN ANH 32 / 34

3. CLASSIFICATION (cont)- DECISION TREE

Exercise for group discussion

Discuss with your group and submit your answers:

List some other extensions of decision tree (except Random forests)
List as much as possible the potential applications of Classification
algorithm in business or in real world.
Link to submit your group’s answer:
https://docs.google.com/forms/d/1OEPTearh8DaM8I4O8Mb8iUUWfmCpxy4
l3q5EiM

Instructor: TRAN THI TUAN ANH 33 / 34

3. CLASSIFICATION (cont)- DECISION TREE

THE END

THANK YOU FOR LISTENING

Instructor: TRAN THI TUAN ANH 34 / 34

Product Management Final Exam
No ratings yet
Product Management Final Exam
24 pages
Data Mining - Classification Using Frequent Pattern
No ratings yet
Data Mining - Classification Using Frequent Pattern
8 pages
1694600905-Unit2.4 Decision Tree CU 2.0
No ratings yet
1694600905-Unit2.4 Decision Tree CU 2.0
29 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
NOTES
No ratings yet
NOTES
18 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
ML_UNIT_3_NOTES-1
No ratings yet
ML_UNIT_3_NOTES-1
118 pages
Dokumen - Tips Decision Tree and Random Forest 58f9e8a0cce07
No ratings yet
Dokumen - Tips Decision Tree and Random Forest 58f9e8a0cce07
17 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
decisiontrees (1)
No ratings yet
decisiontrees (1)
28 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Tree
No ratings yet
Tree
7 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
No ratings yet
Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
32 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Unit 4
No ratings yet
Unit 4
33 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material II 19-May-2021 Random Forest
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material II 19-May-2021 Random Forest
22 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Decision Tree Ppt
0% (1)
Decision Tree Ppt
24 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Lec 34
No ratings yet
Lec 34
32 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
No ratings yet
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
31 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Deciosn_tree_(1)
No ratings yet
Deciosn_tree_(1)
5 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
AIML Final Cpy Word
No ratings yet
AIML Final Cpy Word
15 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
thuvienhoclieu.com-Bo-de-kiem-tra-giua-HK2-Tieng-Anh-6-global-24-25
No ratings yet
thuvienhoclieu.com-Bo-de-kiem-tra-giua-HK2-Tieng-Anh-6-global-24-25
16 pages
Module 11
No ratings yet
Module 11
18 pages
D and F Block Elements - JEE Main 2024 January Question Bank - MathonGo
No ratings yet
D and F Block Elements - JEE Main 2024 January Question Bank - MathonGo
13 pages
Hele 5: Materials Used in Woodworking
No ratings yet
Hele 5: Materials Used in Woodworking
3 pages
Synopsis of Inventory Management (Vikas)
100% (1)
Synopsis of Inventory Management (Vikas)
3 pages
Normaltestreport
No ratings yet
Normaltestreport
3 pages
2070 Operating System Manual
No ratings yet
2070 Operating System Manual
40 pages
Fundamental Charters of a Company
No ratings yet
Fundamental Charters of a Company
34 pages
E Tech Concept Paper
No ratings yet
E Tech Concept Paper
9 pages
Sony DVP-F21 3070344112
No ratings yet
Sony DVP-F21 3070344112
84 pages
SSC_MTS_2022_ALL_SHIFT_GS_QUESTIONS_WITH_DETAIL_SOLUTION_English[1]
No ratings yet
SSC_MTS_2022_ALL_SHIFT_GS_QUESTIONS_WITH_DETAIL_SOLUTION_English[1]
180 pages
iPhone battery and performance - Apple Support (IN)
No ratings yet
iPhone battery and performance - Apple Support (IN)
9 pages
EB NEET-12 PCB Grand Test-11 Question Paper of 10th July 2022
No ratings yet
EB NEET-12 PCB Grand Test-11 Question Paper of 10th July 2022
37 pages
Fulmer Et Al. 2021 - 0
No ratings yet
Fulmer Et Al. 2021 - 0
1 page
Tender No Lot 3
No ratings yet
Tender No Lot 3
2 pages
CH 5 Exchange Rates of Ifm
No ratings yet
CH 5 Exchange Rates of Ifm
12 pages
0067 Art Design Stage 1 Scheme of Work - tcm142-555730
No ratings yet
0067 Art Design Stage 1 Scheme of Work - tcm142-555730
44 pages
Problem Set For Acceleration
No ratings yet
Problem Set For Acceleration
3 pages
Microwave and Antennas Syllabus
No ratings yet
Microwave and Antennas Syllabus
2 pages
Professional - Portfolio
No ratings yet
Professional - Portfolio
5 pages
South East Asian
No ratings yet
South East Asian
9 pages
Switch de Flujo Serie 6201-6240
No ratings yet
Switch de Flujo Serie 6201-6240
1 page
009 11537 MATWorX Installation Guide
No ratings yet
009 11537 MATWorX Installation Guide
68 pages
288SRO dimention
No ratings yet
288SRO dimention
4 pages
1.1 One Step Equations Practice
No ratings yet
1.1 One Step Equations Practice
2 pages
Lilac Simple Modern Accountant Corporate Resume
No ratings yet
Lilac Simple Modern Accountant Corporate Resume
1 page
Maintain Training Facilities
93% (40)
Maintain Training Facilities
124 pages
ENGLISH MODULE For Junior High
No ratings yet
ENGLISH MODULE For Junior High
16 pages