0% found this document useful (0 votes)
3 views

Module 3 - Ensemble Learning

Ensemble Learning - Vietnamese

Uploaded by

thaihaidang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module 3 - Ensemble Learning

Ensemble Learning - Vietnamese

Uploaded by

thaihaidang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 178

Random Forest

(Basic, Advanced Concepts and Its Applications)

Vinh Dinh Nguyen


PhD in Computer Science
2
Outline
➢ Decision Tree: Review

➢ Random Forest

➢ Fill in missing data with Random Forest

➢ Time series Vs. Supervised Learning

➢ Example

➢ Summary
3
Decision Tree Review

Tiêm 5 đơn vị
Unit(đơn vị) Effect (hiệu quả) (%) vaccine
10 98
20 0
35 100
5 44
… …

Khi có 1 vaccine ra đời, chúng ta muốn dự đoán xem nó hiệu


quả bao nhiêu % ứng với từng liều lượng dùng trên bệnh Hiệu quả vaccine:
nhân. 44%
4
Decision Tree Review
14.5 23.5 29

Unit < 14.5


Kết quả dự đoán
100 cho unit < 23.5

Predict effect: Unit < 29


4.2
75
Kết quả dự đoán
cho unit > 23.5 Unit < 23.5 Predict effect 2.5
50
Effectiveness
(Hiệu quả)
(%) 25 Predict effect: 100 Predict effect: 52.8

Kết quả dự đoán Kết quả dự đoán


cho unit < 14.5 cho unit >= 29

10 20 30 40
Unit (Đơn vị) vaccine
5
Decision Tree Review
14.5 23.5 29

Unit < 14.5


Kết quả dự đoán
100 cho unit < 23.5

Predict effect: Unit < 29


4.2
75
Kết quả dự đoán
cho unit > 23.5 Unit < 23.5 Predict effect 2.5
50
Effectiveness
(Hiệu quả)
(%) 25 Predict effect: 100 Predict effect: 52.8

Kết quả dự đoán Kết quả dự đoán


cho unit < 14.5 cho unit >= 29

10 20 30 40
Dữ liệu test
Unit (Đơn vị) vaccine
Dữ liệu train Error
6
Decision Tree Review
14.5 23.5 29

Unit < 14.5


Kết quả dự đoán
100 cho unit < 23.5

Predict effect: Unit < 29


4.2
75
Kết quả dự đoán
cho unit > 23.5 Delete Predict effect 2.5
50
Effectiveness
(Hiệu quả)
(%) 25 Delete Delete

Kết quả dự đoán


Kết quả dự đoán cho unit >= 29 Dữ liệu test Dữ liệu train
cho unit < 14.5

10 20 30 40
Unit (Đơn vị) vaccine Note : If we want to prune the tree more, we could remove last two
leaves and replace the split with a leaf that is the average of all of the Error
observations
7
Decision Tree Review
14.5 29

Unit < 14.5


Kết quả dự đoán
100 cho unit < 23.5

Predict effect: Unit < 29


4.2
75 Kết quả dự đoán
cho unit < 29

Predict effect 73.8 Predict effect 2.5


50
Effectiveness
(Hiệu quả)
(%) 25
Kết quả dự đoán
Kết quả dự đoán cho unit >= 29
cho unit < 14.5

10 20 30 40
Dữ liệu test
Unit (Đơn vị) vaccine
Dữ liệu train Error
8
Tree Complexity Penalty

The tree complexity penalty compensates for the difference in the


number of leaves.

Tree Score = sum of squared residual + αT


α (alpha) is a tuning parameter that we finding using cross validation.
T is the total number of terminal nodes/the total number of leaves

For now, let’s let α = 10,000 and calculate tree score for each tree.
9
How to Select α

α=0 α = 10,000 α = 15000 α =20,000


Split 1 … … … …
Split 2 … … … …
Split 3 … … … …
Split 4 … … … …
Split 5 … … … …
Average 50,000 5000 11,000 30,000

In this case, the optimal trees built with α = 10,000 had, on


average, the lowest sum of square residuals. So α = 10,000 is our
final value.
10
Decision Tree Review

Advantages Disadvantages

✓ Very easy to explain ✓ Do not have the same level of predicting accuracy as
(Bạn nghĩ có dễ hiểu hơn linear regression không?) some other regression and classification methods

✓ More closely mirror human ✓ Small changes in the data can cause a large change in
(Bạn nghĩ sao về điều này?) the large estimated tree.

✓ Can easily handle qualitative predictors without the ✓ Are less effective in making predictions when the
need of create dummy variables. main goal is to predict the outcome of a continuous
(Dummy variable là gì?) variable

https://twitter.com/gsutters/status/1281001812577976329
11
Dummy Variable

https://twitter.com/gsutters/status/1281001812577976329
12
Bias-Variance Trade-off
Y2 Y2 Y3

X3
X2
High bias, low X1 Low variance, low High variance, low
variance bias bias
(Underfitting) (just right) (overfitting)
Weak Learner
13
Bias-Variance Trade-off
Height
Need to develop a ML algorith to capture this
Real Dataset relationship

True
Relationshi
p

Weight
Weight

Trannin Testing
g
14
Bias-Variance Trade-off
Linear Linear Regression will never capture the
Regression true relationship between weight and
height

The inability of machine learning to


capture the true relationship is call bias
>> 0

Linear Regression has a high bias

Polynomial Polynomial Regression can capture the


Regression true relationship between weight and
height

Polynomial Regression has a low bias


≈0
15
Bias-Variance Trade-off
Linear
Regression Linear Regression has a high bias because …

Linear Regression has low variance because its SSR


are very similar for diference dataesets
>> 0

The different in fits between datatasets is call


variance
Polynomial Polynomial Regression has a low bias because …
Regression
Polynomial Regression has high variance because it
returns in huge different in SSR between train and test
>> 0 dataset
16
Bias-Variance Trade-off
Low bias
Input Low
Our model
Train (Linear SSR on Train
Regression)
High
Output High bias

DATASET
Low Variance
Input High
Our model
Test (Linear SSR on Test
Regression)
Output High High variance

Bias as the error rate of the training data. The different in fits between datatasets is call variance
17 Prediction errors: (Bias and Variance)

Bias as the error rate of the training data. The different in fits between datatasets is call variance
18 Random Forest: Motivation

You want to buy a perfume for your


girlfriend(s)?
What would you do?
1 2 Channel đi
Channel đi
con! bạn!!

Ask for
Idea!
5 Search:
Cucci 3
Cucci đi!!
4
Cucci ạ!!
19 Random Forest: Motivation

You want to buy a perfume for your


girlfriend(s)?
What would you do?
1 2 Channel đi
Channel đi
con! bạn!!

Mua
Gucci
Thôi!!!
5 Search:
Cucci 3
Cucci đi!!
4
Cucci ạ!!
20 Random Forest: Motivation

You want to buy a perfume for your


girlfriend(s)?
What would you do?
1 2 Channel đi
Channel đi
con! bạn!!

Mua
Gucci
Thôi!!!
5 Search:
Cucci 3
Cucci đi!!
4
Cucci ạ!!
21 Random Forest: Motivation

ENSEMPLE LEARNING
22 What is an Ensemble Learning
23 Homogenous Approach
24 Heterogeneous Approach
25 Ensemble Learning Techniques

Ensemple Learning
Thông dụng ở các cuộc thi về
AI

Bagging Boosting Stacking


homogeneous weak learners homogeneous weak learners Heterogeneous weak learners
Random Forest
26 Bagging-based Method

Random Forest

ENSEMPLE LEARNING
27 Boosting-Based Method
28 Stacking-Based Method
29 Decision Tree vs Random Forest

https://commons.wikimedia.org/wiki/File:Decision_Tree_vs._Random_Forest.png
30
Outline
➢ Decision Tree: Review

➢ Random Forest

➢ Fill in missing data with Random Forest

➢ Time series Data

➢ Example

➢ Summary
31 Random Forest is a Solution
32 Step to Random Forest

CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART


CIRCULATION ARTERIES DISEASE
NO NO NO 125 NO

YES YES YES 180 YES

YES YES NO 210 NO

YES NO YES 167 YES


33 1st Step: Create a New Dataset
CHEST GOOD BLOCKE WEIGHT HEART
PAIN BLOOD D DISEASE
CIRCULATIO ARTERIE
N S
NO NO NO 125 NO Original DATA

YES YES YES 180 YES

YES YES NO 210 NO


New DATA
YES NO YES 167 YES
CHEST GOOD BLOCKED WEIGHT HEART
PAIN BLOOD ARTERIES DISEASE
Chọn lựa ngẫu nhiên từ CIRCULATION
dataset ban đầu YES YES YES 180 YES
NO NO NO 125 NO
YES NO YES 167 YES
YES NO YES 167 YES
34 1st Step: Create a New Dataset
CHEST GOOD BLOCKE WEIGHT HEART
PAIN BLOOD D DISEASE
CIRCULATIO ARTERIE
N S
NO NO NO 125 NO Original DATA

YES YES YES 180 YES

YES YES NO 210 NO


Bootrapped Dataset
YES NO YES 167 YES
CHEST GOOD BLOCKED WEIGHT HEART
PAIN BLOOD ARTERIES DISEASE
Chọn lựa ngẫu nhiên từ CIRCULATION
dataset ban đầu YES YES YES 180 YES
NO NO NO 125 NO
YES NO YES 167 YES
YES NO YES 167 YES
35 2nd Step: Create a New Dataset
GENERATE DECISION TREES FROM THE BOOTSTRAPPED DATASET USING PREDEFINED CONDITIONS

CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART


CIRCULATION ARTERIES DISEASE
A RANDOM SUBSET OF 2 ATTRIBUTES
YES YES YES 180 YES (OR 2 COLUMNS).
NO NO NO 125 NO
YES NO YES 167 YES
YES NO YES 167 YES

CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART Traditional Tree


CIRCULATION ARTERIES DISEASE

YES YES YES 180 YES


NO NO NO 125 NO Tree with Predefined
Conditions
YES NO YES 167 YES
YES NO YES 167 YES
36

Chọn lựa ngẫu nhiên 2


features (columns)
CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART
Giả sử Good Blood là
CIRCULATION ARTERIES DISEASE
root node GOOD BLOOD
YES YES YES 180 YES
NO NO NO 125 NO
YES NO YES 167 YES
??? ???
YES NO YES 167 YES

Chọn lựa ngẫu nhiên 2 features (columns) Loại bỏ Good Blood ra


khỏi dataset
CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART
CIRCULATION ARTERIES DISEASE CIRCULATION ARTERIES DISEASE

YES YES YES 180 YES YES YES YES 180 YES
NO NO NO 125 NO NO NO NO 125 NO
YES NO YES 167 YES YES NO YES 167 YES
YES NO YES 167 YES YES NO YES 167 YES
37

Chọn lựa ngẫu nhiên 2 features (columns)

GOOD BLOOD
CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART
CIRCULATION ARTERIES DISEASE Giả sử Chest pain là
node tối ưu
YES YES YES 180 YES
NO NO NO 125 NO Chest Pain ???
YES NO YES 167 YES
YES NO YES 167 YES
??? ???
Chọn lựa ngẫu nhiên 2 features (columns)
Loại bỏ chest pain ra khỏi dataset
CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART
CIRCULATION ARTERIES DISEASE CIRCULATION ARTERIES DISEASE

YES YES YES 180 YES YES YES YES 180 YES
NO NO NO 125 NO NO NO NO 125 NO
YES NO YES 167 YES YES NO YES 167 YES
YES NO YES 167 YES YES NO YES 167 YES
38

Chọn lựa ngẫu nhiên 2 features (columns) GOOD BLOOD

CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART Giả sử Weight là node
CIRCULATION ARTERIES DISEASE
tối ưu
Chest Pain ???
YES YES YES 180 YES
NO NO NO 125 NO
YES NO YES 167 YES
YES NO YES 167 YES
Weight ???

Loại bỏ Weight ra khỏi dataset


Blocked Arteries Weight
CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART
CIRCULATION ARTERIES DISEASE

YES YES YES 180 YES


NO NO NO 125 NO Weight
Weight
YES NO YES 167 YES
YES NO YES 167 YES
39 1st Decision Tree
CHEST GOOD BLOCKED WEIGHT HEART
PAIN BLOOD ARTERIES DISEASE
CIRCULATION
YES YES YES 180 YES
NO NO NO 125 NO
YES NO YES 167 YES
YES NO YES 167 YES
40 Create N Tree
1st bootstrapped dataset 1st tree

2nd 2nd tree


bootstrapped dataset

Generate

nth bootstrapped dataset nth tree
41 Create N Tree
1st bootstrapped dataset 1st tree

2nd 2nd tree


bootstrapped dataset

Random Forest
Generate

nth bootstrapped dataset nth tree
42 How to Predict New Sample

Chest Pain No
NEW PATIENT
GOOD BLOOD No
CIRCULATION
BLOCKED ARTERIES No
Weight 125

Tôi có thể bị
bệnh không?

Heart Disease
43 Bagging Technique

Dataset

Bootstrapped
Dataset
44 How to Predict New Sample

1st tree
Predict
Chest Pain No
GOOD BLOOD No
CIRCULATION
BLOCKED ARTERIES No
Weight 125 2nd tree Heart Disease

Predict Yes No
Tôi có thể bị
7 2
bệnh không?
Rất tiếc, bạn
đã mắc
nth tree bệnh!
Predict
45 Review
ORIGINAL DATA

BOOTSTRAPPED DATASET
RANDOMLY SELECT DATA

ALLOW DUPLICATED VALUES


46 Review

Original Dataset Bootstrapped Dataset


CHEST PAIN GOOD BLOOD BLOCKED WEIGH HEART CHEST PAIN GOOD BLOOD BLOCKED HEART
CIRCULATION ARTERIES T DISEASE CIRCULATION ARTERIES WEIGHT DISEASE
NO NO NO 125 NO YES YES YES 180 YES

YES YES YES 180 YES NO NO NO 125 NO

YES YES NO 210 NO YES NO YES 167 YES

YES NO YES 167 YES YES NO YES 167 YES

Mộp phần của dataset ban đầu có thể không có mặt ở Bootstrapped dataset
47 Out-of-bag Dataset

OUT-OF-BAG ERROR

Chúng ta có thể sử dụng out-of-bag dataset để đo lường độ chính xác của Random Forest
48
Outline
➢ Decision Tree: Review

➢ Random Forest

➢ Fill in missing data with Random Forest

➢ Time series Data

➢ Example

➢ Summary
49 Random Forest with Missing Data
50 Types of Missing Data

CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART


CIRCULATION ARTERIES DISEASE

NO NO NO 125 NO

YES YES YES 180 YES

YES YES NO 210 NO

YES NO N/A N/A NO

Text or
Numbering
51 How to Fill in Missing Data

DATA WITH
MISSING GUESSING REFINE THE
VALUES THE DATA GUESSES
52 Guessing the Data
CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART
CIRCULATION ARTERIES DISEASE

NO NO NO 125 NO

YES YES YES 180 YES

YES YES NO 210 NO

YES NO No 167.5 NO

Ý tưởng: Điền giá ban đầu, sau đó


hiệu chỉnh dần cho nó tốt hơn
53 Refine the Guesses

BUILD EVALUATE THE


RANDOM DATA FOR ALL
FOREST TREES
54 Proximity Matrix

Sample 3 and sample 4 reaches to


the same decision

Mỗi cột thể hiện 1 sample

1 2 3 4
1
2
1st Tree Mỗi dòng thể hiện
1 sample 3 1
4 1

Dòng 3 và 4 cùng trả về kết quả là No


55 Proximity Matrix

Sample 3 and sample 4 reaches to


the same decision

Mỗi cột thể hiện 1 sample

1 2 3 4
1
2 1 1
Mỗi dòng thể hiện
2nd Tree 3 1 2
1 sample
4 1 2
56 Proximity Matrix Of N Trees

1 2 3 4

1 2 1 1

2 2 1 1

3 1 1 8

4 1 1 8
57 Proximity Matrix Of N Trees

1 2 3 4
1 0.2 0.1 0.1
Normalization:
Assume we have 10 2 0.2 0.1 0.1
trees.

3 0.1 0.1 0.8


4 0.1 0.1 0.8
58 Fill in the Missing Values

Frequency of Yes: 1/3

The weight frequency of Yes = Frequency of Yes *


Weight for Yes
Proximity Matrix The weight frequency of Yes = 1/3 * 0.1 = 0.03
1 2 3 4
Weight for Yes = Proximity of Yes/All
1 0.2 0.1 0.1 proximities
2 0.2 0.1 0.1
Proximity of Yes:
3 0.1 0.1 0.8
0.1
4 0.1 0.1 0.8 All proximities: 0.1 + 0.1 + 0.8 =
1.0
59 Fill in the Missing Values

Frequency of No: 2/3

The weight frequency of No = Frequency of No *


Weight for No
Proximity Matrix The weight frequency of No = 2/3 * 0.9 = 0.6
1 2 3 4
Weight for No = Proximity of No/All
1 0.2 0.1 0.1 proximities
2 0.2 0.1 0.1
Proximity of No: 0.1 + 0.8 =
3 0.1 0.1 0.8
0.9
4 0.1 0.1 0.8 All proximities: 0.1 + 0.1 + 0.8 =
1.0
60 Fill in the Missing Values

Predict

The weight frequency of No = 2/3 * 0.9 = The weight frequency of Yes = 1/3 * 0.1 =
0.6 0.03
61 Fill in the Missing Values

s1’s weight = 125

Weight s1 = s1’s weight * Weighted average weight of


s1
Weight s1 = 125 * 0.1 = 12.5

Proximity Matrix
1 2 3 4
1 0.2 0.1 0.1 Weighted average weight of s1 = 0.1 / (0.1 + 0.8 + 0.1)
= 0.1
2 0.2 0.1 0.1
3 0.1 0.1 0.8
4 0.1 0.1 0.8
62 Fill in the Missing Values

s2’s weight = 180

Weight s2 = s2’s weight * Weighted average weight of


s2
Weight s2 = 180 * 0.1= 18.0

Proximity Matrix
1 2 3 4
1 0.2 0.1 0.1 Weighted average weight of s2 = 0.1 / (0.1 + 0.8 + 0.1)
= 0.1
2 0.2 0.1 0.1
3 0.1 0.1 0.8
4 0.1 0.1 0.8
63 Fill in the Missing Values

s3’s weight = 210

Weight s3 = s3’s weight * Weighted average weight of


s3
Weight s3 = 210 * 0.8= 168.0

Proximity Matrix
1 2 3 4
1 0.2 0.1 0.1 Weighted average weight of s3 = 0.8 / (0.1 + 0.8 + 0.1)
= 0.8
2 0.2 0.1 0.1
3 0.1 0.1 0.8
4 0.1 0.1 0.8
64 Fill in the Missing Values
Summation
Weight s1 = 125 * 0.1 = 12.5

Weight s2 = 180 * 0.1= 18.0

Weight s3 = 210 * 0.8= 168.0

198.5
65
Outline
➢ Decision Tree: Review

➢ Random Forest

➢ Fill in missing data with Random Forest

➢ Time series Vs. Supervised Learning

➢ Example

➢ Summary
66 Time Series vs Supervised Learning
Time series data is a collection of data points over time.
67 Time Series vs Supervised Learning
A time series is a sequence of numbers A supervised learning problem is comprised of input
that are ordered by a time index. This can patterns (X) and output patterns (y), such that an algorithm
be thought of as a list or column of can learn how to predict the output patterns from the input
ordered values. patterns.
68 Time Series vs Supervised Learning
Time series data can be phrased as supervised learning

Time Series data Supervised learning

Sliding Window For Time Series Data

Sliding Window With Multivariate


Time Series Data
69 Time Series vs Supervised Learning
A key function to help transform time series data into a supervised learning problem is the Pandas shift() function.

Time Series data Supervised learning

Sliding Window For Time Series Data

Sliding Window With Multivariate


Time Series Data
70 Time Series vs Supervised Learning
A key function to help transform time series data into a supervised learning problem is the Pandas shift() function.
71 Time Series vs Supervised Learning
A key function to help transform time series data into a supervised learning problem is the Pandas shift() function.
72 Time Series vs Supervised Learning
A key function to help transform time series data into a supervised learning problem is the Pandas shift() function.
73 Time Series vs Supervised Learning
One-Step Univariate Forecasting
74 Time Series vs Supervised Learning
Multi-Step or Sequence Forecasting
75 Time Series vs Supervised Learning
Multivariate Forecasting
76
Outline
➢ Decision Tree: Review

➢ Random Forest

➢ Fill in missing data with Random Forest

➢ Time series Vs. Supervised Learning

➢ Example

➢ Summary
77 Example
The daily female births dataset, that is the monthly births across three years

We will use only the previous six time steps as input to the model
78 Example
The daily female births dataset, that is the monthly births across three years

We will use only the previous six time steps as input to the model
79
k-Fold cross-validation
1. Split the dataset into k equal (if possible)
parts (they are called folds)

2.Choose k – 1 folds as the training set. The


remaining fold will be the test set

3.Train the model on the training set. On each


iteration of cross-validation, you must train a
new model independently of the model
trained on the previous iteration

4.Validate on the test set

5.Save the result of the validation

6.Repeat steps 3 – 6 k times. Each time use


the remaining fold as the test set. In the end,
you should have validated the model on every
fold that you have.

7.To get the final score average the results


that you got on step 6.
80 Time Series Cross Validation
With time series data, we cannot shuffle the data!
Rolling Window

Expanding Window
81 Time Series Cross Validation
With time series data, we cannot shuffle the data!
82 Time Series Cross Validation
83
Outline
➢ Decision Tree: Review

➢ Random Forest

➢ Fill in missing data with Random Forest

➢ Time series Vs. Supervised Learning

➢ Example

➢ Summary
84

84
AdaBoost & Gradient Boost
(Basic, Advanced Concepts and Its Applications)

Vinh Dinh Nguyen


PhD in Computer Science
2
Outline
➢ Boosting Techniques

➢ AdaBoost Clearly Explain

➢ Gradient Boost Clearly Explain

➢ Time Series Data: Predicting Energy Consumption

➢ Summary
3
Decision Tree and Its Variance

Tiêm 5 đơn vị
Unit(đơn vị) Effect (hiệu quả) (%) vaccine
10 98
20 0
35 100
5 44
… …

Khi có 1 vaccine ra đời, chúng ta muốn dự đoán xem nó hiệu


quả bao nhiêu % ứng với từng liều lượng dùng trên bệnh Hiệu quả vaccine:
nhân. 44%
4
Decision Tree and Its Variance
5 What is an Ensemble Learning?
6 Homogeneous Approach
7 Heterogeneous Approach
8 Ensemple Learning Techniques

Ensemple Learning
Thông dụng ở các cuộc thi về
AI

Bagging Boosting Stacking


homogeneous weak learners homogeneous weak learners Heterogeneous weak learners
9 Bagging-based Method

Random
Forest

Last Week
10 Boosting-Based Method
11 Stacking-Based Method
12 Stacking-Based Method
13 Boosting Technique

Boosting is an ensemble modelling,


technique that attempts to build a
strong classifier from the number of
weak classifiers

https://en.wikipedia.org/wiki/Boosting_%28machine_learning%29#/media/File:Ensemble_Boosting.svg
14 Boosting Technique

Boosting is an ensemble modelling,


technique that attempts to build a
strong classifier from the number of
weak classifiers

https://en.wikipedia.org/wiki/Boosting_%28machine_learning%29#/media/File:Ensemble_Boosting.svg
15 Stump Definition

a node with two leaves and this is known as


Stump
16
Outline
➢ Boosting Techniques

➢ AdaBoost Clearly Explain

➢ Gradient Boost Clearly Explain

➢ Time Series Data: Predicting Energy Consumption

➢ Summary
17 AdaBoost: Forest of Stump
18 AdaBoost: Forest of Stump
19 Sample Dataset

CHEST PAIN GOOD BLOOD BLOCKED WEIGHT HEART


CIRCULATION ARTERIES DISEASE

NO NO NO 125 NO

YES YES YES 180 YES

YES YES NO 210 NO

YES NO YES 167 YES


20 Sample Dataset

STRONG CLASSIFIER

WEAK CLASSIFIER
21 Random Forest
• Each tree in the random forest has equal votes(weights) on the
final decision
22 AdaBoost: FOREST OF STUMP

• Stump are not equally weighted in the final decision.


• Stump that create more error will have less contribution in the final
decision
23 Random Forest

Tree are indepently


created
24 AdaBoost: Forest of Stump
1 2

Influence
4

3
25 Differences Between RF and AdaBoost

Weak Learners is a ___


AdaBoost combines a lot of___

Stumps have various ___ to Each stump is created by


the final result considering ___
26 Heart Disease Dataset

Chest Pain Blocked Arteries Patient Weight Heart Disease


Yes Yes 205 Yes
No Yes 180 Yes
Yes No 210 Yes
Yes Yes 167 Yes
No Yes 156 No
No Yes 125 No
Yes No 168 No
Yes Yes 172 No

Important of sample = Sample weight = 1 / number of samples = 1/8


27 1st Stump in the Tree
28 Compute Gini Index For Chest Pain

Gini index = 5/8 * (1 – (3/5)^2 – (2/5)^2) + 3/8* (1 - (1/3)^2 - (2/3)^2) =


0.57

CHEST PAIN

Yes No

HEART DISEASE HEART DISEASE

YES NO YES NO
3 2 1 2
29 Gini Index for Blocked Arteries

Gini index = 6/8 * (1 – (3/6)^2 – (3/6)^2) + 2/8* (1 - (1/2)^2 - (1/2)^2) =


0.5

BLOCKED ARTERIES

Yes No

HEART DISEASE HEART DISEASE

YES NO YES NO
3 3 1 1
30 Gini Index for Heart Disease

Gini index = =4/8 * (1-(1/4)^2 - (3/4)^2) + 4/8* (1-(1/4)^2 -(3/4)^2) = 0.375

192.5
195
188.5
161.5
140.5
146.5
170
PATIENT WEIGHT > 170

HEART DISEASE HEART DISEASE

YES NO YES NO
3 1 1 3
31 Amount of Say

How was this stump contribute to the final decision (classification)?

PATIENT WEIGHT > 170

YES NO

HEART DISEASE HEART DISEASE

YES NO YES NO
3 1 1 3

Amount
of Say
32 Amount of Say: Patient Weight

• Total error is equal to the sum of the weights of the incorrect


classified
• Amount of say = 1/2*log((1-2/8) / (2/8)) = 0.55
33 Amount of Say: Weight of The Tree

Probability Vs. Odds


Your friend went fishing 10 times a month
• Caught a fish 4 times
• Failed to catch 6 times
What is the probability and odds of getting a Fish for lunch?

𝐶hance for catching fish 4


Probability = = = 0.4
Total chances 10

𝐶hance for catching fish 4


Odds = = = 0.67
𝐶hance for not catching fish 6

Probability of catching fish 4/10


Odds = = = 0.67
Probability of not catching fish 6/10

1 −Probability of not catching fish 4/10


Odds = = = 0.67
Probability of not catching fish 6/10
34 Amount of Say: Weight of The Tree

PATIENT WEIGHT > 170

YES NO
1 −Probability of not catching fish 4/10
Odds = = = 0.67
Probability of not catching fish 6/10
HEART DISEASE HEART DISEASE

YES NO YES NO 1 −Probability of incorrect prediction


Odds =
Probability of incorrect prediction
3 1 1 3

Amount of says = Odds or Amount of says = ½ x log(Odds)

Why?
35 Amount of Say: Chest Pain

• Total error is equal to the sum of the weights of the incorrect


classified
• Amount of say = 1/2*log((1-3/8) / (3/8)) = 0.25
1 −Probability of incorrect prediction
log(Odds) = log( )
Probability of incorrect prediction
36 Amount of Say: Blocked Arteries

• Total error is equal to the sum of the weights of the


incorrect classified
• Amount of say = 1/2*log((1-4/8) / (4/8)) = 0
37 Assumptions

Known: weight cho các sample dự đoán sai được sử dụng để tính “Amount of
Say” cho từng stump hiện tại.

Unknown: Tiếp theo, chúng ta cần làm thế nào để sử dụng thông tin các weight
của sample dự đoán sai này để xây dựng stump tiếp và khắc phục các dự đoán sai
này
38 Idea: Improved Bootstrapped Dataset

Incorrect

Incorrect
Create new dataset

This new stump can handel incorrect


classification
39 How to Build Next Stump

Increase the sample weights of samples that were incorrectly classified and
decrease sample weights of samples that were correctly classified. Label {-1,
1}

New sample weight = 1/8 * e^{0.55} =


0.22
40 How to Build Next Stump

Increase the sample weights of samples that were incorrectly classified and
decrease the sample weights of samples that were correctly classified. Label {-1,
1}

New sample weight = 1/8 * e^{-0.55} = 0.07


41 How to Build Next Stump

Increase the sample weights of samples that were incorrectly classified and
keep the sample weights of samples that were correctly classified. Label {0,
1}

New sample weight = 1/8 * e^{0.55*1} =


0.22
42 How to Build Next Stump

Increase the sample weights of samples that were incorrectly classified and keep
the sample weights of samples that were correctly classified. Label {0, 1}

New sample weight = 1/8 * e^{0.55*0} =


0.125
43 New Sample Weight

Chest Pain Blocked Patient Weight Heart Diease Sample New Weight Normal
Arteries Weight Weight
Yes Yes 205 Yes 1/8 0.07 0.08
No Yes 180 Yes 1/8 0.07 0.08
Yes No 210 Yes 1/8 0.07 0.08
Yes Yes 167 Yes 1/8 0.22 0.25
No Yes 156 No 1/8 0.07 0.08
No Yes 125 No 1/8 0.07 0.08
Yes No 168 No 1/8 0.07 0.08
Yes Yes 172 No 1/8 0.22 0.25
Sum ~1.0 0.86 ~1.0

Update
44 New Sample Weight

Chest Pain Blocked Arteries Patient Weight Heart Diease New Weight
Yes Yes 205 Yes 0.08
No Yes 180 Yes 0.08
Yes No 210 Yes 0.08
Yes Yes 167 Yes 0.25
No Yes 156 No 0.08
No Yes 125 No 0.08
Yes No 168 No 0.08
Yes Yes 172 No 0.25
Sum ~1.0
45 AdaBoost: FOREST OF STUMPS

IMPROVE ERROR

IMPROVE ERROR

IMPROVE ERROR
46 New Dataset
47 New Dataset

Random [0,1] to select samples


48 New Dataset

Chest Blocked Patient Heart Normal Range


Pain Arteries Weight Diease Weight
Yes Yes 205 Yes 0.08 [0-0.08]
No Yes 180 Yes 0.08 (0.08-0.16]
Yes No 210 Yes 0.08 (0.16-0.24]
Yes Yes 167 Yes 0.25 (0.24-0.495]
No Yes 156 No 0.08 (0.495-0.575]
No Yes 125 No 0.08 (0.575-0.655]
Yes No 168 No 0.08 (0.655-0.735]
Yes Yes 172 No 0.25 (0.735-1.0]
Sum ~1.0
49 New Dataset

Random
[0,1] to
select CONTINUE TO BUILD
samples THE NEXT STUMP

Old dataset

Ý tưởng: các sample bị phân loại sai, sẽ được được


nhiều hơn vào dataset mới

New dataset
50 How to Classify The Final Result

These stumps for predicting These stumps for predicting


heart disease no heart disease
51 How to Classify The Final Result

https://hastie.su.domains/Papers/samm
52 How to Classify The Final Result

SAMME — Stagewise
Additive Modeling using a
Multi-class Exponential
loss function

https://hastie.su.domains/Papers/samm
e.pdf
53 Behind The Scene

M trees Label: {-1,1} N Samples


The hypothesis function f(x) is
defined as:

Exponential loss
function:

Supposing that:

Expand the error function:

not dependent
on
αₘ and Gm
54 Behind The Scene

M trees
Expand the error function: N Samples
Label: {-1,1}

It’s easy to show that the expression -yᵢαₘGm(xᵢ) is -αₘ if yᵢ=


Gm(xᵢ)
and is αₘ if yᵢ!= Gm(xᵢ).

Total Error weight Ew :


weight Tw as:
55 Behind The Scene

Taking log on
both sides we
have
56 Summary

Improve error
Adaboost builds a stump based on the
the error made by previous stumps

Improve error
Improve error
57 Example: Spam Classification

• Classifying Email as Spam or Non-Spam

http://archive.ics.uci.edu/dataset/94/spambase
58 Example: Spam Classification
59 Example: Spam Classification
Our implementation

Using Sklearn library


60
Outline
➢ Boosting Techniques

➢ AdaBoost Clearly Explain

➢ Gradient Boost Clearly Explain

➢ Time Series Data: Predicting Energy Consumption

➢ Summary
61 Gradient Boost For Regression

Height Favorite Color Gender Weight


1.6 Blue Male 88
1.6 Green Female 76
1.5 Blue Female 56
1.8 Red Male 73
1.5 Green Male 77
1.4 Blue Female 57

Input Output
62 Tree-based Gradient Boost
• Step 1: Build 1st tree
• Calculate the average of weights
Height Favorite Color Gender Weight
1.6 Blue Male 88
1.6 Green Female 76
1.5 Blue Female 56
1.8 Red Male 73
1.5 Green Male 77
1.4 Blue Female 57

Node of 1st Tree Average of Weights: 71.17


63 Tree-based Gradient Boost
• Step 2:
➢Build 2nd tree

Average of weights: 71.17


64 Tree-based Gradient Boost
• Step 2:
➢Build 2nd tree

Average of weights: 71.2

Height Favorite Color Gender Weight Residual Error


1.6 Blue Male 88 16.8
1.6 Green Female 76 1.8
1.5 Blue Female 56 -15.2
1.8 Red Male 73 1.8
1.5 Green Male 77 5.8
1.4 Blue Female 57 -14.2
65 Tree-based Gradient Boost

Height Favorite Color Gender Residual Error


1.6 Blue Male 16.8
Tại sao lại xây dựng cây
1.6 Green Female 1.8
1.5 Blue Female -15.2 dự đoán Residual Error
1.8 Red Male 1.8
1.5 Green Male 5.8
Gender is Female
1.4 Blue Female -14.2

Height < 1.6 Color is not Blue

-14.2, -15.2 4.8 1.5, 5.8 16.8


66 Tree-based Gradient Boost

Height Favorite Color Gender Residual Error


1.6 Blue Male 16.8
Tại sao lại xây dựng cây
1.6 Green Female 1.8
1.5 Blue Female -15.2 dự đoán Residual Error
1.8 Red Male 1.8
1.5 Green Male 5.8
Gender is Female
1.4 Blue Female -14.2

Height < 1.6 Color is not Blue

Trung bình residual -14.7 4.8 3.8 16.8


67 Tree-based Gradient Boost

1st Tree
Average of weights: 71.2

Gender is Female

2nd Tree
Height < 1.6 Color is not Blue

-14.7 4.8 3.8 16.8


68 Prediction
Height Favorite Color Gender Weight Prediction
1.6 Blue Male 88 88
1.6 Green Female 76 76

1.5 Blue Female 56 56


1.8 Red Male 73 73
1.5 Green Male 77 77
1.4 Blue Female 57 57

OVER FITTING
69 Prediction

1st Tree
AVG of weights: 71.2

Learning
Rate
70 Prediction

Height Favorite Color Gender Weight Prediction


1.6 Blue Male 88 74.56

Prediction = 71.2 + 0.2 * 16.8 = 74.56

0.2
71 Tree-based Gradient Boost
• Step 3: 2nd Tree
➢Build 3rd tree 1st Tree
Average of weights: 71.2

Height Favorite Color Gender Weight Predicted Weight Residual Error


1.6 Blue Male 88 74.56 12.44
1.6 Green Female 76 … …
1.5 Blue Female 56 … …
1.8 Red Male 73 … …
1.5 Green Male 77 … …
1.4 Blue Female 57 … …
72 Tree-based Gradient Boost
• Step 3: 2nd Tree
➢Build 3rd tree 1st Tree
Average of weights: 71.2

Height Favorite Color Gender First Tree Second Tree Third Tree
Residual Residual Residual
1.6 Blue Male 16.8 12.44 ???
1.6 Green Female 1.8 … ???
1.5 Blue Female -15.2 … ???
1.8 Red Male 1.8 … ???
1.5 Green Male 5.8 … ???
1.4 Blue Female -14.2 … ???
73 Prediction

AVG of weights: 71.2


74 Gradient Boost: Behind The Scenes

Loss Height Favorite Color Gender Weight


1.6 Blue Male 88
Function:
1.6 Green Female 76
n
1.5 Blue Female 56
1.8 Red Male 73
1.5 Green Male 77
1.4 Blue Female 57

• {(xi,yi)}n1
Loss function = L(yi, F(x)) = 1/2* (Output - Predicted)2

𝒅𝑳
= 2/2 (Output – Predicted) * -1 = - (Output – Predicted)
𝒅𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
Tricky implementation here
75 Gradient Boost: Behind The Scenes

• Step 1:
• Initialize a model with a constant value:
• 𝑭𝟎 𝒙 = 𝒂𝒓𝒈𝒎𝒊𝒏 σ𝒏𝒊=𝟏 𝑳(𝒚, 𝜹)
𝜹

𝒚
SSR = 1/2 {(88 – 𝜹)^2 + (76 – 𝜹)^2 + (56 - 𝜹)^2}
𝑑𝑆𝑆𝑅
= -(88 - 𝜹) – (76 - 𝜹) – (56 - 𝜹) = 0
𝑑𝜹
88+76+56
𝜹= = 73.3 = average of all sample’ weights
3
76 Gradient Boost: Behind The Scenes

• Step 1:
• Initialize a model with a constant value:
• 𝑭𝟎 𝒙 = 𝒂𝒓𝒈𝒎𝒊𝒏 σ𝒏𝒊=𝟏 𝑳(𝒚, 𝜹) = 73.3
𝜹

𝑺𝑺𝑹

𝜹
77 Gradient Boost: Behind The Scenes

• M is the number of trees


• n is the number of
samples
78 Gradient Boost: Behind The Scenes

R is the residual error; m is the m-th tree; j is the j-th leaf

Height Favorite Color Gender Weight ri1


1.6 Blue Male 88 14.7
1.6 Green Female 76 2.7

1.5 Blue Female 56 -17.3

R11 R21
79 Step 2 Step 1

𝑭m − 𝟏 𝒙 = 𝑭𝟎 𝒙 = 𝟕𝟑. 𝟑

R is the residual error; m is the m-th tree; j is the i-th leaf

Height Favorite Color Gender Weight ri1


1.6 Blue Male 88 14.7
1.6 Green Female 76 2.7

1.5 Blue Female 56 -17.3

1
𝛾11 = argmin y3 − (F0 x3 + 𝛾) 2
𝛾 2
1
𝛾11 = argmin 56 − (73.3 + 𝛾) 2
𝛾 2
1
𝛾11 = argmin −17.3 − 𝛾 2
𝛾 2
xi ∈ R11 xi ∈ R21
𝛾11 = -17.3 𝛾21 = 8.7
80 Step 2 Step 1

𝑭m − 𝟏 𝒙 = 𝑭𝟎 𝒙 = 𝟕𝟑. 𝟑

R is the residual error; m is the m-th tree; j is the i-th leaf

Height Favorite Color Gender Weight ri1


1.6 Blue Male 88 14.7
1.6 Green Female 76 2.7

1.5 Blue Female 56 -17.3

1 1
𝛾21 = argmin y1 − (F0 x1 + 𝛾) 2 + y2 − (F0 x2 + 𝛾) 2
𝛾 2 2
1 1
𝛾11 = argmin 88 − (73.3 + 𝛾) 2 + 76 − (73.33 + 𝛾) 2
2 2
𝛾 1 1
𝛾11 = argmin 14,7 − 𝛾 2 + 2.7 − 𝛾) 2 Giá trị trung
2 2
𝛾 xi ∈ R11 xi ∈ R21 bình của 2
𝛾21 = 8.7
𝛾11 = -17.3 𝛾21 = 8.7 samples
Step 2
81
Step 1

𝑭m − 𝟏 𝒙 = 𝑭𝟎 𝒙 = 𝟕𝟑. 𝟑

R is the residual error; m is the m-th tree; j is the i-th leaf

Height Favorite Color Gender Weight ri1 F1(X)

1.6 Blue Male 88 14.7 74.2


1.6 Green Female 76 2.7 74.2

1.5 Blue Female 56 -17.3 71.6

xi ∈ R11 xi ∈ R21
𝛾11 = -17.3 𝛾21 = 8.7
82 Repeat for the next m + 1 iteration
83 Summary
84
Outline
➢ Boosting Techniques

➢ AdaBoost Clearly Explain

➢ Gradient Boost Clearly Explain

➢ Time Series Data: Predicting Energy Consumption

➢ Summary
85 Time Series Forecasting
We will focus on the energy consumption problem, where given a sufficiently large dataset of the daily energy
consumption of different households in a city, we are tasked to predict as accurately as possible the future energy
demands.

Preprocessing
86 Time Series Forecasting
We will focus on the energy consumption problem, where given a sufficiently large dataset of the daily energy
consumption of different households in a city, we are tasked to predict as accurately as possible the future energy
demands.

Consumption changes through the years

During the winter months we observe high demands in


energy, while throughout the summer the consumption is at
the lowest levels.
87 Time Series Forecasting
Visualize the fluctuation in the span of a year we can do
88 Time Series Forecasting
We have only one feature: the full date. We can extract different features based on the full date
such as the day of the week, the day of the year, the month and others
89 Time Series Forecasting
The dataset contains almost 2.5 years of data, so for the testing set we will use only the last 6 months
90 Time Series Forecasting
Prepare dataset for Training and Testing

Performance Evaluation and


Visualization
91 Time Series Forecasting
Gradient Boosting For Training
92 Time Series Forecasting
Gradient Boosting For Predicting

MAE: 0.7535964848932999
MSE: 1.3449409757830804
MAPE: 0.1951895064391348
94

94

You might also like