Lecture 10 Ensemble Methods
Lecture 10 Ensemble Methods
Forests
Boosting
Dropout
Atsuto Maki
Autumn, 2020
When ?
For which questions ?
S1 S2 S3 S4 S5
S6 S7 S8 S9 S10
Property of instability
Bagging - Bootstrap Aggregating
S = {(x1 , y1 ), . . . , (xm , ym )}
Iterate: for b = 1, . . . , B
1 Sample training examples, with replacement,
m0 times from S to create Sb (m0 ≤ m).
2 Use this bootstrap sample Sb to estimate the
regression or classification function fb .
Bagging - Bootstrap Aggregating
S = {(x1 , y1 ), . . . , (xm , ym )}
Iterate: for b = 1, . . . , B
1 Sample training examples, with replacement,
m0 times from S to create Sb (m0 ≤ m).
2 Use this bootstrap sample Sb to estimate the
regression or classification function fb .
Output: The bagging estimate for
Classification:
B
X
fbag (x) = argmax Ind (fb (x) = k)
1≤k≤K
b=1
Regression:
B
1 X
fbag (x) = fb (x)
B b=1
Note: it only produces good results for high variance, low bias
classifiers.
Apply bagging to the original example
Loop:
- Apply learner to weighted samples
- Increase weights of misclassified examples
Example: Ensemble Prediction
Voting of oriented hyper-planes can define convex regions.
Green region is the true boundary.
ht ∈ H t = 1, ..., T
αt : confidence/reliability
Ensemble Method: Boosting
Input: Training data S = {(x1 , y1 ), . . . , (xm , ym )} of inputs
xi and their labels yi ∈ {−1, 1} or real values.
H: a family of possible weak classifiers/regression functions.
ht ∈ H t = 1, ..., T
αt : confidence/reliability
Ensemble Method: Boosting
Ensemble Method: Boosting
Round 1
Round 2
Round 3
Round 4
Round 5
Round 6
Round 7
Round 8
Round 9
Round 10
Round 11
...........................
Example
Round 21
S = {(x1 , y1 ), . . . , (xm , ym )}
(1)
Initialize: Introduce a weight, wj , for each training
sample.
(1) 1
Set wj = m
for each j.
Adaboost Algorithm (cont.)
Iterate: for t = 1, . . . , T
(t) (t)
1 Train weak classifier ht ∈ H using S and w1 , . . . , wm ;
select the one that minimizes the training error:
m
(t)
X
t = wj Ind (yj 6= ht (xj ))
j=1
FACE or NON-FACE
Input: x Apply filter: f j (x) Output: h(x) = (f j (x) > θ)
For t = 1, . . . , T
for each filter type j
1 Apply filter, f j , to each example.
2 Sort examples by their filter responses.
3 Select best threshold for this classifier: θtj .
4 Keep record of error of this classifier: tj .
Select the filter-threshold combination (weak classifier
j ∗ ) with minimum error. Then set jt = j ∗ , t = tj ∗
and θt = θtj ∗ .
Re-weight examples according to the AdaBoost
formualae.
Note: (There are many tricks to make this implementation more efficient.)
Viola & Jones: Sliding window
99
only a tiny proportion of the patches will be faces and
many of them will not look anything like a face.
% Detection
Exploit this fact: Introduce a cascade of increasingly
50
strong classifiers
99
only a tiny proportion of the patches will be faces and
many of them will not look anything like a face.
% Detection
Exploit this fact: Introduce a cascade of increasingly
50
strong classifiers