Module 3 - Ensemble Learning
Module 3 - Ensemble Learning
➢ Random Forest
➢ Example
➢ Summary
3
Decision Tree Review
Tiêm 5 đơn vị
Unit(đơn vị) Effect (hiệu quả) (%) vaccine
10 98
20 0
35 100
5 44
… …
10 20 30 40
Unit (Đơn vị) vaccine
5
Decision Tree Review
14.5 23.5 29
10 20 30 40
Dữ liệu test
Unit (Đơn vị) vaccine
Dữ liệu train Error
6
Decision Tree Review
14.5 23.5 29
10 20 30 40
Unit (Đơn vị) vaccine Note : If we want to prune the tree more, we could remove last two
leaves and replace the split with a leaf that is the average of all of the Error
observations
7
Decision Tree Review
14.5 29
10 20 30 40
Dữ liệu test
Unit (Đơn vị) vaccine
Dữ liệu train Error
8
Tree Complexity Penalty
For now, let’s let α = 10,000 and calculate tree score for each tree.
9
How to Select α
Advantages Disadvantages
✓ Very easy to explain ✓ Do not have the same level of predicting accuracy as
(Bạn nghĩ có dễ hiểu hơn linear regression không?) some other regression and classification methods
✓ More closely mirror human ✓ Small changes in the data can cause a large change in
(Bạn nghĩ sao về điều này?) the large estimated tree.
✓ Can easily handle qualitative predictors without the ✓ Are less effective in making predictions when the
need of create dummy variables. main goal is to predict the outcome of a continuous
(Dummy variable là gì?) variable
https://twitter.com/gsutters/status/1281001812577976329
11
Dummy Variable
https://twitter.com/gsutters/status/1281001812577976329
12
Bias-Variance Trade-off
Y2 Y2 Y3
X3
X2
High bias, low X1 Low variance, low High variance, low
variance bias bias
(Underfitting) (just right) (overfitting)
Weak Learner
13
Bias-Variance Trade-off
Height
Need to develop a ML algorith to capture this
Real Dataset relationship
True
Relationshi
p
Weight
Weight
Trannin Testing
g
14
Bias-Variance Trade-off
Linear Linear Regression will never capture the
Regression true relationship between weight and
height
DATASET
Low Variance
Input High
Our model
Test (Linear SSR on Test
Regression)
Output High High variance
Bias as the error rate of the training data. The different in fits between datatasets is call variance
17 Prediction errors: (Bias and Variance)
Bias as the error rate of the training data. The different in fits between datatasets is call variance
18 Random Forest: Motivation
Ask for
Idea!
5 Search:
Cucci 3
Cucci đi!!
4
Cucci ạ!!
19 Random Forest: Motivation
Mua
Gucci
Thôi!!!
5 Search:
Cucci 3
Cucci đi!!
4
Cucci ạ!!
20 Random Forest: Motivation
Mua
Gucci
Thôi!!!
5 Search:
Cucci 3
Cucci đi!!
4
Cucci ạ!!
21 Random Forest: Motivation
ENSEMPLE LEARNING
22 What is an Ensemble Learning
23 Homogenous Approach
24 Heterogeneous Approach
25 Ensemble Learning Techniques
Ensemple Learning
Thông dụng ở các cuộc thi về
AI
Random Forest
ENSEMPLE LEARNING
27 Boosting-Based Method
28 Stacking-Based Method
29 Decision Tree vs Random Forest
https://commons.wikimedia.org/wiki/File:Decision_Tree_vs._Random_Forest.png
30
Outline
➢ Decision Tree: Review
➢ Random Forest
➢ Example
➢ Summary
31 Random Forest is a Solution
32 Step to Random Forest
YES YES YES 180 YES YES YES YES 180 YES
NO NO NO 125 NO NO NO NO 125 NO
YES NO YES 167 YES YES NO YES 167 YES
YES NO YES 167 YES YES NO YES 167 YES
37
GOOD BLOOD
CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART
CIRCULATION ARTERIES DISEASE Giả sử Chest pain là
node tối ưu
YES YES YES 180 YES
NO NO NO 125 NO Chest Pain ???
YES NO YES 167 YES
YES NO YES 167 YES
??? ???
Chọn lựa ngẫu nhiên 2 features (columns)
Loại bỏ chest pain ra khỏi dataset
CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART
CIRCULATION ARTERIES DISEASE CIRCULATION ARTERIES DISEASE
YES YES YES 180 YES YES YES YES 180 YES
NO NO NO 125 NO NO NO NO 125 NO
YES NO YES 167 YES YES NO YES 167 YES
YES NO YES 167 YES YES NO YES 167 YES
38
CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART Giả sử Weight là node
CIRCULATION ARTERIES DISEASE
tối ưu
Chest Pain ???
YES YES YES 180 YES
NO NO NO 125 NO
YES NO YES 167 YES
YES NO YES 167 YES
Weight ???
Generate
…
nth bootstrapped dataset nth tree
41 Create N Tree
1st bootstrapped dataset 1st tree
Random Forest
Generate
…
nth bootstrapped dataset nth tree
42 How to Predict New Sample
Chest Pain No
NEW PATIENT
GOOD BLOOD No
CIRCULATION
BLOCKED ARTERIES No
Weight 125
Tôi có thể bị
bệnh không?
Heart Disease
43 Bagging Technique
Dataset
Bootstrapped
Dataset
44 How to Predict New Sample
1st tree
Predict
Chest Pain No
GOOD BLOOD No
CIRCULATION
BLOCKED ARTERIES No
Weight 125 2nd tree Heart Disease
Predict Yes No
Tôi có thể bị
7 2
bệnh không?
Rất tiếc, bạn
đã mắc
nth tree bệnh!
Predict
45 Review
ORIGINAL DATA
BOOTSTRAPPED DATASET
RANDOMLY SELECT DATA
Mộp phần của dataset ban đầu có thể không có mặt ở Bootstrapped dataset
47 Out-of-bag Dataset
OUT-OF-BAG ERROR
Chúng ta có thể sử dụng out-of-bag dataset để đo lường độ chính xác của Random Forest
48
Outline
➢ Decision Tree: Review
➢ Random Forest
➢ Example
➢ Summary
49 Random Forest with Missing Data
50 Types of Missing Data
NO NO NO 125 NO
Text or
Numbering
51 How to Fill in Missing Data
DATA WITH
MISSING GUESSING REFINE THE
VALUES THE DATA GUESSES
52 Guessing the Data
CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART
CIRCULATION ARTERIES DISEASE
NO NO NO 125 NO
YES NO No 167.5 NO
1 2 3 4
1
2
1st Tree Mỗi dòng thể hiện
1 sample 3 1
4 1
1 2 3 4
1
2 1 1
Mỗi dòng thể hiện
2nd Tree 3 1 2
1 sample
4 1 2
56 Proximity Matrix Of N Trees
1 2 3 4
1 2 1 1
2 2 1 1
3 1 1 8
4 1 1 8
57 Proximity Matrix Of N Trees
1 2 3 4
1 0.2 0.1 0.1
Normalization:
Assume we have 10 2 0.2 0.1 0.1
trees.
Predict
The weight frequency of No = 2/3 * 0.9 = The weight frequency of Yes = 1/3 * 0.1 =
0.6 0.03
61 Fill in the Missing Values
Proximity Matrix
1 2 3 4
1 0.2 0.1 0.1 Weighted average weight of s1 = 0.1 / (0.1 + 0.8 + 0.1)
= 0.1
2 0.2 0.1 0.1
3 0.1 0.1 0.8
4 0.1 0.1 0.8
62 Fill in the Missing Values
Proximity Matrix
1 2 3 4
1 0.2 0.1 0.1 Weighted average weight of s2 = 0.1 / (0.1 + 0.8 + 0.1)
= 0.1
2 0.2 0.1 0.1
3 0.1 0.1 0.8
4 0.1 0.1 0.8
63 Fill in the Missing Values
Proximity Matrix
1 2 3 4
1 0.2 0.1 0.1 Weighted average weight of s3 = 0.8 / (0.1 + 0.8 + 0.1)
= 0.8
2 0.2 0.1 0.1
3 0.1 0.1 0.8
4 0.1 0.1 0.8
64 Fill in the Missing Values
Summation
Weight s1 = 125 * 0.1 = 12.5
198.5
65
Outline
➢ Decision Tree: Review
➢ Random Forest
➢ Example
➢ Summary
66 Time Series vs Supervised Learning
Time series data is a collection of data points over time.
67 Time Series vs Supervised Learning
A time series is a sequence of numbers A supervised learning problem is comprised of input
that are ordered by a time index. This can patterns (X) and output patterns (y), such that an algorithm
be thought of as a list or column of can learn how to predict the output patterns from the input
ordered values. patterns.
68 Time Series vs Supervised Learning
Time series data can be phrased as supervised learning
➢ Random Forest
➢ Example
➢ Summary
77 Example
The daily female births dataset, that is the monthly births across three years
We will use only the previous six time steps as input to the model
78 Example
The daily female births dataset, that is the monthly births across three years
We will use only the previous six time steps as input to the model
79
k-Fold cross-validation
1. Split the dataset into k equal (if possible)
parts (they are called folds)
Expanding Window
81 Time Series Cross Validation
With time series data, we cannot shuffle the data!
82 Time Series Cross Validation
83
Outline
➢ Decision Tree: Review
➢ Random Forest
➢ Example
➢ Summary
84
84
AdaBoost & Gradient Boost
(Basic, Advanced Concepts and Its Applications)
➢ Summary
3
Decision Tree and Its Variance
Tiêm 5 đơn vị
Unit(đơn vị) Effect (hiệu quả) (%) vaccine
10 98
20 0
35 100
5 44
… …
Ensemple Learning
Thông dụng ở các cuộc thi về
AI
Random
Forest
Last Week
10 Boosting-Based Method
11 Stacking-Based Method
12 Stacking-Based Method
13 Boosting Technique
https://en.wikipedia.org/wiki/Boosting_%28machine_learning%29#/media/File:Ensemble_Boosting.svg
14 Boosting Technique
https://en.wikipedia.org/wiki/Boosting_%28machine_learning%29#/media/File:Ensemble_Boosting.svg
15 Stump Definition
➢ Summary
17 AdaBoost: Forest of Stump
18 AdaBoost: Forest of Stump
19 Sample Dataset
NO NO NO 125 NO
STRONG CLASSIFIER
WEAK CLASSIFIER
21 Random Forest
• Each tree in the random forest has equal votes(weights) on the
final decision
22 AdaBoost: FOREST OF STUMP
Influence
4
3
25 Differences Between RF and AdaBoost
CHEST PAIN
Yes No
YES NO YES NO
3 2 1 2
29 Gini Index for Blocked Arteries
BLOCKED ARTERIES
Yes No
YES NO YES NO
3 3 1 1
30 Gini Index for Heart Disease
192.5
195
188.5
161.5
140.5
146.5
170
PATIENT WEIGHT > 170
YES NO YES NO
3 1 1 3
31 Amount of Say
YES NO
YES NO YES NO
3 1 1 3
Amount
of Say
32 Amount of Say: Patient Weight
YES NO
1 −Probability of not catching fish 4/10
Odds = = = 0.67
Probability of not catching fish 6/10
HEART DISEASE HEART DISEASE
Why?
35 Amount of Say: Chest Pain
Known: weight cho các sample dự đoán sai được sử dụng để tính “Amount of
Say” cho từng stump hiện tại.
Unknown: Tiếp theo, chúng ta cần làm thế nào để sử dụng thông tin các weight
của sample dự đoán sai này để xây dựng stump tiếp và khắc phục các dự đoán sai
này
38 Idea: Improved Bootstrapped Dataset
Incorrect
Incorrect
Create new dataset
Increase the sample weights of samples that were incorrectly classified and
decrease sample weights of samples that were correctly classified. Label {-1,
1}
Increase the sample weights of samples that were incorrectly classified and
decrease the sample weights of samples that were correctly classified. Label {-1,
1}
Increase the sample weights of samples that were incorrectly classified and
keep the sample weights of samples that were correctly classified. Label {0,
1}
Increase the sample weights of samples that were incorrectly classified and keep
the sample weights of samples that were correctly classified. Label {0, 1}
Chest Pain Blocked Patient Weight Heart Diease Sample New Weight Normal
Arteries Weight Weight
Yes Yes 205 Yes 1/8 0.07 0.08
No Yes 180 Yes 1/8 0.07 0.08
Yes No 210 Yes 1/8 0.07 0.08
Yes Yes 167 Yes 1/8 0.22 0.25
No Yes 156 No 1/8 0.07 0.08
No Yes 125 No 1/8 0.07 0.08
Yes No 168 No 1/8 0.07 0.08
Yes Yes 172 No 1/8 0.22 0.25
Sum ~1.0 0.86 ~1.0
Update
44 New Sample Weight
Chest Pain Blocked Arteries Patient Weight Heart Diease New Weight
Yes Yes 205 Yes 0.08
No Yes 180 Yes 0.08
Yes No 210 Yes 0.08
Yes Yes 167 Yes 0.25
No Yes 156 No 0.08
No Yes 125 No 0.08
Yes No 168 No 0.08
Yes Yes 172 No 0.25
Sum ~1.0
45 AdaBoost: FOREST OF STUMPS
IMPROVE ERROR
IMPROVE ERROR
IMPROVE ERROR
46 New Dataset
47 New Dataset
Random
[0,1] to
select CONTINUE TO BUILD
samples THE NEXT STUMP
Old dataset
New dataset
50 How to Classify The Final Result
https://hastie.su.domains/Papers/samm
52 How to Classify The Final Result
SAMME — Stagewise
Additive Modeling using a
Multi-class Exponential
loss function
https://hastie.su.domains/Papers/samm
e.pdf
53 Behind The Scene
Exponential loss
function:
Supposing that:
not dependent
on
αₘ and Gm
54 Behind The Scene
M trees
Expand the error function: N Samples
Label: {-1,1}
Taking log on
both sides we
have
56 Summary
Improve error
Adaboost builds a stump based on the
the error made by previous stumps
Improve error
Improve error
57 Example: Spam Classification
http://archive.ics.uci.edu/dataset/94/spambase
58 Example: Spam Classification
59 Example: Spam Classification
Our implementation
➢ Summary
61 Gradient Boost For Regression
Input Output
62 Tree-based Gradient Boost
• Step 1: Build 1st tree
• Calculate the average of weights
Height Favorite Color Gender Weight
1.6 Blue Male 88
1.6 Green Female 76
1.5 Blue Female 56
1.8 Red Male 73
1.5 Green Male 77
1.4 Blue Female 57
1st Tree
Average of weights: 71.2
Gender is Female
2nd Tree
Height < 1.6 Color is not Blue
OVER FITTING
69 Prediction
1st Tree
AVG of weights: 71.2
Learning
Rate
70 Prediction
0.2
71 Tree-based Gradient Boost
• Step 3: 2nd Tree
➢Build 3rd tree 1st Tree
Average of weights: 71.2
Height Favorite Color Gender First Tree Second Tree Third Tree
Residual Residual Residual
1.6 Blue Male 16.8 12.44 ???
1.6 Green Female 1.8 … ???
1.5 Blue Female -15.2 … ???
1.8 Red Male 1.8 … ???
1.5 Green Male 5.8 … ???
1.4 Blue Female -14.2 … ???
73 Prediction
• {(xi,yi)}n1
Loss function = L(yi, F(x)) = 1/2* (Output - Predicted)2
𝒅𝑳
= 2/2 (Output – Predicted) * -1 = - (Output – Predicted)
𝒅𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
Tricky implementation here
75 Gradient Boost: Behind The Scenes
• Step 1:
• Initialize a model with a constant value:
• 𝑭𝟎 𝒙 = 𝒂𝒓𝒈𝒎𝒊𝒏 σ𝒏𝒊=𝟏 𝑳(𝒚, 𝜹)
𝜹
𝒚
SSR = 1/2 {(88 – 𝜹)^2 + (76 – 𝜹)^2 + (56 - 𝜹)^2}
𝑑𝑆𝑆𝑅
= -(88 - 𝜹) – (76 - 𝜹) – (56 - 𝜹) = 0
𝑑𝜹
88+76+56
𝜹= = 73.3 = average of all sample’ weights
3
76 Gradient Boost: Behind The Scenes
• Step 1:
• Initialize a model with a constant value:
• 𝑭𝟎 𝒙 = 𝒂𝒓𝒈𝒎𝒊𝒏 σ𝒏𝒊=𝟏 𝑳(𝒚, 𝜹) = 73.3
𝜹
𝑺𝑺𝑹
𝜹
77 Gradient Boost: Behind The Scenes
R11 R21
79 Step 2 Step 1
𝑭m − 𝟏 𝒙 = 𝑭𝟎 𝒙 = 𝟕𝟑. 𝟑
1
𝛾11 = argmin y3 − (F0 x3 + 𝛾) 2
𝛾 2
1
𝛾11 = argmin 56 − (73.3 + 𝛾) 2
𝛾 2
1
𝛾11 = argmin −17.3 − 𝛾 2
𝛾 2
xi ∈ R11 xi ∈ R21
𝛾11 = -17.3 𝛾21 = 8.7
80 Step 2 Step 1
𝑭m − 𝟏 𝒙 = 𝑭𝟎 𝒙 = 𝟕𝟑. 𝟑
1 1
𝛾21 = argmin y1 − (F0 x1 + 𝛾) 2 + y2 − (F0 x2 + 𝛾) 2
𝛾 2 2
1 1
𝛾11 = argmin 88 − (73.3 + 𝛾) 2 + 76 − (73.33 + 𝛾) 2
2 2
𝛾 1 1
𝛾11 = argmin 14,7 − 𝛾 2 + 2.7 − 𝛾) 2 Giá trị trung
2 2
𝛾 xi ∈ R11 xi ∈ R21 bình của 2
𝛾21 = 8.7
𝛾11 = -17.3 𝛾21 = 8.7 samples
Step 2
81
Step 1
𝑭m − 𝟏 𝒙 = 𝑭𝟎 𝒙 = 𝟕𝟑. 𝟑
xi ∈ R11 xi ∈ R21
𝛾11 = -17.3 𝛾21 = 8.7
82 Repeat for the next m + 1 iteration
83 Summary
84
Outline
➢ Boosting Techniques
➢ Summary
85 Time Series Forecasting
We will focus on the energy consumption problem, where given a sufficiently large dataset of the daily energy
consumption of different households in a city, we are tasked to predict as accurately as possible the future energy
demands.
Preprocessing
86 Time Series Forecasting
We will focus on the energy consumption problem, where given a sufficiently large dataset of the daily energy
consumption of different households in a city, we are tasked to predict as accurately as possible the future energy
demands.
MAE: 0.7535964848932999
MSE: 1.3449409757830804
MAPE: 0.1951895064391348
94
94