0% found this document useful (0 votes)

6 views

ShortCourse-QTT-Lecture2

Uploaded by

vothithanhkieu400

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

ShortCourse-QTT-Lecture2

Uploaded by

vothithanhkieu400

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Unsupervised Learning

Lecture 2: Decision Tree and Random Forest

Tho Quan
[email protected]
Agenda

• Inductive learning
• Decision Tree: ID3 and C4.5
• From Decision Tree to Random Forest
Five Tribes of Machine Learning
Inference Mechanisms

• Deduction: cause + rule è effect

• Abduction: effect + rule è cause
• Induction: cause + effect è rule

• Inference must be made on closed-world assumption

• All propositions must be TRUE of FALSE
• Unknown proposition à FALSE

• IF temperature high AND NOT (water level low) THEN pressure high
• IF tranducer output low THEN water level low
Deductiion and Induction
Rule 1 : If Travel cost/km is expensive then Gender :Male
mode = car Car Ownership : 1
Rule 2 : If Travel cost/km is standard then Travel Cost/Km : Standard
mode = train Income Level : High
Rule 3 : If Travel cost/km is cheap and gender Transportation Mode ?
is male then mode = bus
Rule 4 : If Travel cost/km is cheap and gender
is female and she owns no car then mode =
bus
Rule 5 : If Travel cost/km is cheap and gender
is female and she owns 1 car then mode =
train
Decision Tree

• Decision Tree is a hierarchical tree structure that used to classify

classes based on a series of questions (or rules) about the attributes of
the class
• Decision tree representation:
• Each internal node tests an attribute
• Each branch corresponds to attribute value
• Each leaf node assigns a classiﬁcation
Generate a Decision Tree

Training Data Decision Tree Make

Predictions on
unseen Data

- Choose best
attribute
- Split data set
- Recurse until each
data item classified
correctly
Generate a Decision Tree
Generate a Decision Tree

• Measure Impurity :

• Information Gain :
Generate a Decision Tree

• Pro(Bus) = 4/10
• Pro(Car) = 3/10
• Pro(Train) = 3/10
• Entropy = – 0.4 log (0.4) – 0.3 log (0.3) – 0.3 log (0.3) =
1.571
• Gini Index = 1 – (0.4^2 + 0.3^2 + 0.3^2) = 0.660
How to Use a Decision Tree

Make
Data Decision Tree Predictions on
unseen Data

Decision Rule
How to Use a Decision Tree

• Gender :Male
• Car Ownership : 1
• Travel Cost/Km : Standard
• Income Level : High
• Transportation Mode ?

• Rule 1 : If Travel cost/km is expensive then mode = car

• Rule 2 : If Travel cost/km is standard then mode = train
• Rule 3 : If Travel cost/km is cheap and gender is male then mode = bus
• Rule 4 : If Travel cost/km is cheap and gender is female and she owns no car then
mode = bus
• Rule 5 : If Travel cost/km is cheap and gender is female and she owns 1 car then
mode = train
From ID3 to C4.5

• Ross Quinlan started with

• ID3 (Quinlan, 1979)
• C4.5 (Quinlan, 1993)
• Some assumptions in the basic algorithm
• All attributes are nominal
• We do not have unknown values
C4.5 algorithm

• Avoid overfitting
• Deal with continuous attributes
• Deal with missing data
Pruning

1. Pre-prune: Stop growing a branch when information becomes unreliable

2. Post-prune: Take a fully-grown decision tree and discard unreliable parts

Dealing with continuous attributes
Intuition about Margin
Elderly
Infant ?

Man ? Woman
Problem with All Margin-based
Discriminative Classifier

It might be very miss-leading to return a high

confidence.
From Decision Tree to Random Forest

• Ensemble Learning
Average out biases
Reduce the variance
Unlikely to overfit
Bias-variance Decomposition

• For any learning scheme,

o Bias = expected error of the combined classifier on new data
o Variance = expected error due to the particular training set used
• Total expected error ~ bias + variance
Ensemble Methods

Bagging (Breiman 1994,…)

Boosting (Freund and Schapire 1995, Friedman et al. 1998,…)

Random forests (Breiman 2001,…)

Predict class label for unseen data by aggregating a set

of predictions (classifiers learned from the training
data).
Overview
Bagging
Bootstrap data sets:
Original data set: X = {x 1 , x 2 , ..., x N }.
Creation of a new data set XB : draw N points at
random from X, with replacement, so that some
points in X may be replicated in XB (where as
other points may be absent from XB).
Bagging
Training the same model on M multiple
bootstrap data sets.
When does Bagging work?

• Learning algorithm is unstable: if small changes to

the training set cause large changes in the learned
classifier.
• If the learning algorithm is unstable, then Bagging
almost always improves performance
Why Bagging works?
Random Forest
Random Forest
Probability Behind
The probability of the ensemble getting the
l

correct answer is a binomial distribution:

where p is the success rate of each base

classifier, and T is the number of base classifiers.
Random Forest Power
The power of ensemble learning: if p > 0.5 then the
l

correctness probability approaches 1 as T → ∞ .

Random Forest - Pros
Require almost no input preparation.
Perform implicit feature selection
Very quick to train.
Pretty tough to beat.
It’s really hard to build a bad Random Forest!
Random Forest
Drawbacks?
Model size
Black boxes
Explanable Random Forest
Get back to the case study of Flight Delay
prediction
What made the flight delay?
Explanable Random Forest

Explanation
- Input: A selected flight in future
Arrival hour: 0.25467993054
(a flight from Changi to Tan Son Airline: 0.253308988692
Nhat for the next 48 hours by Origin: 0.158077791536
Departure time: 0.1364141321
Singapore Airlines).
Destination: 0.105243518586
- Output: Delay prediction (Y/N) Duration: 0.0660441127126
Type: 0.0219200955523
Arrival DoW: 0.00245074824256
Departure DoW: 0.00186059074095
Operation Type: 9.12956600922e-08

All About CELPIP: Steps to Crack Timebound, Memory Challeged, Internet Based Test
From Everand
All About CELPIP: Steps to Crack Timebound, Memory Challeged, Internet Based Test
Joshin Cherian
4.5/5 (5)
(Chapman & Hall - CRC Data Science Series) Brandon M. Greenwell - Tree-Based Methods For Statistical Learning in R - A Practical Introduction With Applications in R-CRC Press (2022)
No ratings yet
(Chapman & Hall - CRC Data Science Series) Brandon M. Greenwell - Tree-Based Methods For Statistical Learning in R - A Practical Introduction With Applications in R-CRC Press (2022)
405 pages
Algorithmic Trading in Python
50% (2)
Algorithmic Trading in Python
28 pages
Ploting Rpart Tree With PRP
No ratings yet
Ploting Rpart Tree With PRP
28 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Decision Trees
No ratings yet
Decision Trees
27 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Lec.7.intro.D.S. Fall 2023
No ratings yet
Lec.7.intro.D.S. Fall 2023
26 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
L6 Decision Tree Classifier
No ratings yet
L6 Decision Tree Classifier
46 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
ML Chapter 4 Part2
No ratings yet
ML Chapter 4 Part2
75 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
decisiontrees (1)
No ratings yet
decisiontrees (1)
28 pages
Decision Trees 4
No ratings yet
Decision Trees 4
56 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
CE880_Lecture7_slides
No ratings yet
CE880_Lecture7_slides
78 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Unit 4
No ratings yet
Unit 4
33 pages
Week 8 - Understanding the Decision Tree
No ratings yet
Week 8 - Understanding the Decision Tree
28 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
Unit 3
No ratings yet
Unit 3
33 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Lesson 3.1 - Supervised Learning Decision Trees
No ratings yet
Lesson 3.1 - Supervised Learning Decision Trees
51 pages
06 - Decision Trees
100% (1)
06 - Decision Trees
83 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
decision tree
No ratings yet
decision tree
13 pages
DM Lect 9_Classification - Decision Trees
No ratings yet
DM Lect 9_Classification - Decision Trees
39 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
CH 5
No ratings yet
CH 5
84 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
AI Lecture 9
No ratings yet
AI Lecture 9
69 pages
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
19 -- Decision Tree -- ID3
No ratings yet
19 -- Decision Tree -- ID3
87 pages
Slide 3
No ratings yet
Slide 3
23 pages
AIML Module 4 Imp
No ratings yet
AIML Module 4 Imp
5 pages
5 Learning
No ratings yet
5 Learning
8 pages
Classification
No ratings yet
Classification
81 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Machine Learning with R
From Everand
Machine Learning with R
Brett Lantz
4/5 (9)
1023 Vol.30 Issue 4 April 2024 Educational Administration Theory and Practice
No ratings yet
1023 Vol.30 Issue 4 April 2024 Educational Administration Theory and Practice
12 pages
Heart Disease Risk Prediction Using Deep Learning Techniques With Feature Augmentation
No ratings yet
Heart Disease Risk Prediction Using Deep Learning Techniques With Feature Augmentation
15 pages
Learning/"
No ratings yet
Learning/"
32 pages
Malware Detection Using ML
No ratings yet
Malware Detection Using ML
20 pages
Comparison of Random Forest and Gradient Boosting Fingerprints To Enhance An Outdoor Radio-Frequency Localization System
No ratings yet
Comparison of Random Forest and Gradient Boosting Fingerprints To Enhance An Outdoor Radio-Frequency Localization System
11 pages
1 s2.0 S092702562400048X Main
No ratings yet
1 s2.0 S092702562400048X Main
13 pages
A Flight Fare Prediction Using Machine Learning
No ratings yet
A Flight Fare Prediction Using Machine Learning
8 pages
Module-3 Eco-598 ML & Ai
No ratings yet
Module-3 Eco-598 ML & Ai
93 pages
Research Article: Research On E-Commerce Database Marketing Based On Machine Learning Algorithm
No ratings yet
Research Article: Research On E-Commerce Database Marketing Based On Machine Learning Algorithm
13 pages
Customer Segmentation Using Data Science
No ratings yet
Customer Segmentation Using Data Science
7 pages
Review paper of house rate prediction
No ratings yet
Review paper of house rate prediction
7 pages
Utkarsh Gupta G (73) (House Price Prediction)
No ratings yet
Utkarsh Gupta G (73) (House Price Prediction)
6 pages
Predicting Satisfaction of Airline Passengers Using Classification
No ratings yet
Predicting Satisfaction of Airline Passengers Using Classification
8 pages
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
No ratings yet
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
5 pages
894-902_87
No ratings yet
894-902_87
9 pages
Effectiveness of Employee Retention Through An Uplift
No ratings yet
Effectiveness of Employee Retention Through An Uplift
31 pages
Entity Embeddings of Categorical Variables
No ratings yet
Entity Embeddings of Categorical Variables
9 pages
Heart Disease Prediction Using ML
No ratings yet
Heart Disease Prediction Using ML
18 pages
Brit J of Edu Psychol - 2024 - Cheung - A Machine Learning Model of Academic Resilience in The Times of The COVID 19
No ratings yet
Brit J of Edu Psychol - 2024 - Cheung - A Machine Learning Model of Academic Resilience in The Times of The COVID 19
21 pages
Machine Learning-Based Prediction of Parking Space
No ratings yet
Machine Learning-Based Prediction of Parking Space
16 pages
Academic Qualifications: ( Online Course)
No ratings yet
Academic Qualifications: ( Online Course)
1 page
Project Report
No ratings yet
Project Report
22 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
CS109a Lecture16 Bagging RF Boosting
No ratings yet
CS109a Lecture16 Bagging RF Boosting
48 pages
Final Report
No ratings yet
Final Report
35 pages
Abhishek Data Scientist Resume
0% (1)
Abhishek Data Scientist Resume
5 pages
Unit 6: Big Data Analytics Using R: 6.0 Overview
No ratings yet
Unit 6: Big Data Analytics Using R: 6.0 Overview
32 pages