0% found this document useful (0 votes)
6 views

ShortCourse-QTT-Lecture2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

ShortCourse-QTT-Lecture2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Unsupervised Learning

Lecture 2: Decision Tree and Random Forest

Tho Quan
[email protected]
Agenda

• Inductive learning
• Decision Tree: ID3 and C4.5
• From Decision Tree to Random Forest
Five Tribes of Machine Learning
Inference Mechanisms

• Deduction: cause + rule è effect


• Abduction: effect + rule è cause
• Induction: cause + effect è rule

• Inference must be made on closed-world assumption


• All propositions must be TRUE of FALSE
• Unknown proposition à FALSE

• IF temperature high AND NOT (water level low) THEN pressure high
• IF tranducer output low THEN water level low
Deductiion and Induction
Rule 1 : If Travel cost/km is expensive then Gender :Male
mode = car Car Ownership : 1
Rule 2 : If Travel cost/km is standard then Travel Cost/Km : Standard
mode = train Income Level : High
Rule 3 : If Travel cost/km is cheap and gender Transportation Mode ?
is male then mode = bus
Rule 4 : If Travel cost/km is cheap and gender
is female and she owns no car then mode =
bus
Rule 5 : If Travel cost/km is cheap and gender
is female and she owns 1 car then mode =
train
Decision Tree

• Decision Tree is a hierarchical tree structure that used to classify


classes based on a series of questions (or rules) about the attributes of
the class
• Decision tree representation:
• Each internal node tests an attribute
• Each branch corresponds to attribute value
• Each leaf node assigns a classification
Generate a Decision Tree

Training Data Decision Tree Make


Predictions on
unseen Data

- Choose best
attribute
- Split data set
- Recurse until each
data item classified
correctly
Generate a Decision Tree
Generate a Decision Tree

• Measure Impurity :

• Information Gain :
Generate a Decision Tree

• Pro(Bus) = 4/10
• Pro(Car) = 3/10
• Pro(Train) = 3/10
• Entropy = – 0.4 log (0.4) – 0.3 log (0.3) – 0.3 log (0.3) =
1.571
• Gini Index = 1 – (0.4^2 + 0.3^2 + 0.3^2) = 0.660
How to Use a Decision Tree

Make
Data Decision Tree Predictions on
unseen Data

Decision Rule
How to Use a Decision Tree

• Gender :Male
• Car Ownership : 1
• Travel Cost/Km : Standard
• Income Level : High
• Transportation Mode ?

• Rule 1 : If Travel cost/km is expensive then mode = car


• Rule 2 : If Travel cost/km is standard then mode = train
• Rule 3 : If Travel cost/km is cheap and gender is male then mode = bus
• Rule 4 : If Travel cost/km is cheap and gender is female and she owns no car then
mode = bus
• Rule 5 : If Travel cost/km is cheap and gender is female and she owns 1 car then
mode = train
From ID3 to C4.5

• Ross Quinlan started with


• ID3 (Quinlan, 1979)
• C4.5 (Quinlan, 1993)
• Some assumptions in the basic algorithm
• All attributes are nominal
• We do not have unknown values
C4.5 algorithm

• Avoid overfitting
• Deal with continuous attributes
• Deal with missing data
Pruning

1. Pre-prune: Stop growing a branch when information becomes unreliable

2. Post-prune: Take a fully-grown decision tree and discard unreliable parts


Dealing with continuous attributes
Intuition about Margin
Elderly
Infant ?

Man ? Woman
Problem with All Margin-based
Discriminative Classifier

It might be very miss-leading to return a high


confidence.
From Decision Tree to Random Forest

• Ensemble Learning
Average out biases
Reduce the variance
Unlikely to overfit
Bias-variance Decomposition

• For any learning scheme,


o Bias = expected error of the combined classifier on new data
o Variance = expected error due to the particular training set used
• Total expected error ~ bias + variance
Ensemble Methods

Bagging (Breiman 1994,…)

Boosting (Freund and Schapire 1995, Friedman et al. 1998,…)

Random forests (Breiman 2001,…)

Predict class label for unseen data by aggregating a set


of predictions (classifiers learned from the training
data).
Overview
Bagging
Bootstrap data sets:
Original data set: X = {x 1 , x 2 , ..., x N }.
Creation of a new data set XB : draw N points at
random from X, with replacement, so that some
points in X may be replicated in XB (where as
other points may be absent from XB).
Bagging
Training the same model on M multiple
bootstrap data sets.
When does Bagging work?

• Learning algorithm is unstable: if small changes to


the training set cause large changes in the learned
classifier.
• If the learning algorithm is unstable, then Bagging
almost always improves performance
Why Bagging works?
Random Forest
Random Forest
Probability Behind
The probability of the ensemble getting the
l

correct answer is a binomial distribution:

where p is the success rate of each base


classifier, and T is the number of base classifiers.
Random Forest Power
The power of ensemble learning: if p > 0.5 then the
l

correctness probability approaches 1 as T → ∞ .


Random Forest - Pros
Require almost no input preparation.
Perform implicit feature selection
Very quick to train.
Pretty tough to beat.
It’s really hard to build a bad Random Forest!
Random Forest
Drawbacks?
Model size
Black boxes
Explanable Random Forest
Get back to the case study of Flight Delay
prediction
What made the flight delay?
Explanable Random Forest

Explanation
- Input: A selected flight in future
Arrival hour: 0.25467993054
(a flight from Changi to Tan Son Airline: 0.253308988692
Nhat for the next 48 hours by Origin: 0.158077791536
Departure time: 0.1364141321
Singapore Airlines).
Destination: 0.105243518586
- Output: Delay prediction (Y/N) Duration: 0.0660441127126
Type: 0.0219200955523
Arrival DoW: 0.00245074824256
Departure DoW: 0.00186059074095
Operation Type: 9.12956600922e-08

You might also like