0% found this document useful (0 votes)

39 views

CSC 3304 Lecture 08 Boosting Ensemble Methods

Boosting Ensemble IIUM

Uploaded by

Steve Oscar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

CSC 3304 Lecture 08 Boosting Ensemble Methods

Boosting Ensemble IIUM

Uploaded by

Steve Oscar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Click to edit Master

subtitle style

| Boosting

Cynthia Rudin | MIT Sloan School of Management

Boosting Motivation

• Question of Kearns: Can you turn a “weak” learning algorithm

(that is barely better than random guessing) into a “strong”
learning algorithm (whose error rate is arbitrarily close to 0)?
Boosting Motivation

• Question of Kearns: Can you turn a “weak” learning algorithm

(that is barely better than random guessing) into a “strong”
learning algorithm (whose error rate is arbitrarily close to 0)?

• We could ask the algorithm to create a lot of classifiers and

figure out how to combine them… but if we give the algorithm
the same input each time, it will produce the same answer, not
a lot of different classifiers.
Boosting Motivation

• Schapire and Freund’s answer:

• Reweight the data in many ways
• Use the weak learning algorithm to create a weak
classifier for each (reweighted) dataset
• Compute a weighted average of the weak classifiers.
Adaboost

• Now there are several boosting algorithm

• One of the popular boosting algorithm is Adaboost.
• Computer Vision by Viola and Jones (2001)
Weak classifiers used by Viola and Jones

• Viola and Jones is working on face detections

• Subtract the white areas from the black ones

they came up with these very simple classifiers

that looked something like this.
Weak classifiers used by Viola and Jones

• Subtract the white areas from the black ones

Weak classifiers used by Viola and Jones

• Subtract the white areas from the black ones

Doesn’t detect
anything

Black and white areas

are very similar
Weak classifiers used by Viola and Jones

• Subtract the white areas from the black ones

Doesn’t detect
anything

Black and white areas

are very similar
Weak classifiers used by Viola and Jones

• Subtract the white areas from the black ones

Now it detects!
Weak classifiers used by Viola and Jones

• Subtract the white areas from the black ones

Weak classifiers

Detects
eyes!
Weak classifiers used by Viola and Jones

• Subtract the white areas from the black ones

Weak classifiers
1.The detector is
actually looking
for the difference
between the dark
eyes and the part Detects
of the face below eyes!
that.
2. is actually looking for (Eye
two black areas Detector)
separated by a white
area which is the
person’s nose.
Weak classifiers used by Viola and Jones
• The detector by itself is not useful, but if you put it together it is
very usef
• Used hundreds of thousands of these weak classifiers at all
different scales
Weak classifiers used by Viola and Jones

• Used hundreds of thousands of these weak classifiers at all

different scales
AdaBoost Pseudocode
Assign observation i the weight of d1i=1/n (equal weights).
For t=1:T

Train weak learning algorithm using data weighted by dti=1/n. This produces
weak classifier ht.
Choose coefficient αt. yi ht (xi ) is 1 if correct, -1 if incorrect

Update weights: Zt is a normalization factor (all the

dt,i exp(- a t yi ht (xi ))
dt+1,i = Weights is add up to 1)
Zt
End
æT ö Weighted sum of the of the
Output the final classifier: H(x) =sign çå a tht (xi )÷ weak classifiers
è t=1 ø
Boosting Example

Now our weak classifiers are only

allowed to be lines that are either
horizontal or vertical in this
example. And all the
observations start with equal
weights.

All points start with equal weights.

(Credit: Example adapted from Freund and Schapire)
Boosting Example
h1

Run the weak learning algorithm to get a

weak classifier.

Choose coefficient a 1 =.41

It classified these two points a positive. And

it classified everything else as negative.
Okay and which points were misclassified?

(Credit: Example adapted from Freund and Schapire)

Boosting Example
h1

Increase the weights on the misclassified

points, decrease the weights on the
correctly classified points.

Because the weights on those points are so high

the weak learning algorithm basically has to get
those three points right on the next round.

(Credit: Example adapted from Freund and Schapire)

Boosting Example

(Credit: Example adapted from Freund and Schapire)

Boosting Example
h2

Run the weak learning algorithm to get a

weak classifier for the weighted data.

Choose coefficient a 2 =.66

(Credit: Example adapted from Freund and Schapire)

Boosting Example
h2

Increase the weights on the misclassified

points, decrease the weights on the
correctly classified points.

(Credit: Example adapted from Freund and Schapire)

Boosting Example

Increase the weights on the misclassified

points, decrease the weights on the
correctly classified points.

(Credit: Example adapted from Freund and Schapire)

Boosting Example

Increase the weights on the misclassified

h3
points, decrease the weights on the
correctly classified points.

Choose coefficient a 3 =.93

(Credit: Example adapted from Freund and Schapire)

Boosting Example
After three rounds of boosting, well, we are going
to stop. And here is the final combined classifier,
which is 0.42 times (1) + 0.66 times (2) and 0.93
h3 times (3). And if you put that all together, here is
the final combined classifier and it classified all the
points correctly.

H=sign(.42 + .66 + .93 )

(Credit: Example adapted from Freund and Schapire)

Click to edit Master
subtitle style

| Decision Forests

Cynthia Rudin | MIT Sloan School of Management

What’s the difference between Boosted Decision
Trees and Decision Forests?

• Decision Forests
– compute many trees from different subsets of data and features
– average them (bagging).
• Boosted Decision Trees
– reweight the data to generate different trees.
– The combination minimizes training error (coordinate descent on the
exponential loss).
Decision Forests

• Complex and powerful prediction tool

• Black-box
• Uses similar idea to boosted decision trees: if you average many
uncorrelated yet accurate models, it reduces variance.
Decision Forests
• Example: Will the customer wait for a table at a restaurant?

• OthOptions: Other options, True if there are restaurants nearby.

• Weekend: This is true if it is Friday, Saturday or Sunday.
• Area: Does it have a bar or other nice waiting area to wait in?
• Plans: Does the customer have plans just after dinner?
• Price: This is either $, $$, $$$, or $$$$
• Precip: Is it raining or snowing?
• Genre: French, Mexican, Thai, or Pizza
• Wait: Wait time estimate: 0-5 min, 5-15 min, 15+
• Crowded: Whether there are other customers (no, some, or full)

Credit: Adapted from Russell and Norvig

Crowded? Crowded?
Full
None
None Full Some
Some No Yes OthOptions?
No Yes Plans? No Yes
Yes Genre? No
No
Price? No French
$$$$ $$$ $$ $ Mex Pizza
Chinese
Yes Yes
No Yes Yes No No
Wait time No

30+ 5-15 0-5

New observation:
No Yes Crowded?
Mexican, $$, Full, 5-15 min
No Yes No plans, No other options
Price?
$$$$ Yes
$$$ $$ $ Majority Vote: Yes
No
No Yes Yes
Decision Forests

A bootstrap sample of size n: Draw n points with replacement at

random from the training data.

(So you have some repeated points, and that’s ok.)

Decision Forests
For t=1 to T:
• Draw a bootstrap sample of size n from the training data.
• Grow a tree (treet) using this splitting and stopping procedure:
– Choose m features at random (out of p)
– Evaluate the splitting criteria on all of them
– Split on the best feature
– If the node has less than nmin then stop splitting.

Output all the trees.

To predict on a new observation x, use the majority vote of the
trees on x.
Decision Forests
Make trees diverse, which
reduces variance
Comparison with decision trees:
• Bootstrap resamples
• Splitting considers only m possible (randomly chosen) features
• No pruning Make trees fit more tightly, reduces bias
• Majority vote of several trees is used to make predictions

Why?
Decision Forests: Measuring Variable Importance

• Let us measure the “importance” of variable j.

• Take the data not used to construct treet. Call it “out-of-bag”,
OOBt.
• Compute errort, using model treet on data OOBt.
• Now randomly permute only the jth feature values.
Decision Forests: Measuring Variable Importance

• Let us measure the “importance” of variable j.

• Take the data not used to construct treet. Call it “out-of-bag”,
OOBt.
Reorder
• Compute errort, using model treet on data OOBt. me!
• Now randomly permute only the jth feature values.
æ x11 x12 x13 ö
OOB
ç ÷
ç x21 x22 x23 ÷
ç x31 x32 x33 ÷
è ø
Decision Forests: Measuring Variable Importance

• Let us measure the “importance” of variable j.

• Take the data not used to construct treet. Call it “out-of-bag”,
OOBt.
Reorder
• Compute errort, using model treet on data OOBt. me!
• Now randomly permute only the jth feature values.
æ x11 x32 x13 ö
ç ÷
ç x21 x12 x23 ÷
ç x31 x22 x33 ÷
è ø
Decision Forests: Measuring Variable Importance

• Let us measure the “importance” of variable j.

• Take the data not used to construct treet. Call it “out-of-bag”,
OOBt.
• Compute errort, using model treet on data OOBt.
• Now randomly permute only the jth feature values. Call this
æ x11 x32 x13 ö
OOBt,permuted.
ç ÷
ç x21 x12 x23 ÷
ç x31 x22 x33 ÷
è ø
Decision Forests: Measuring Variable Importance
• Let us measure the “importance” of variable j.
• Take the data not used to construct treet. Call it “out-of-bag”,
OOBt.
• Compute errort, using model treet on data OOBt.
• Now randomly permute only the jth feature values. Call this
OOBt,permuted.
• Compute errort,permuted, using model treet on data OOBt,permuted.
• The “raw importance”1 of variable j is then the average over trees
of the difference: T å( )
errort -errort,permuted
trees t
Decision Forests for Regression
For t=1 to T:
• Draw a bootstrap sample of size n from the training data.
• Grow a tree (treet) using this splitting and stopping procedure:
– Choose m features at random (out of p)
– Evaluate the splitting criteria on all of them
– Split on the best feature
– If the node has less than nmin then stop splitting.

Output all the trees.

To predict on a new observation x, use the average vote of the
trees on x.
Decision Forests
Advantages
• Complex and powerful prediction tool, highly nonlinear

Disadvantages
• Black-box
• Tends to overfit unless tuned carefully (not always intuitive with
the R package)
• Slow
©2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Office, Azure, System Center, Dynamics and other product names are or may be registered trademarks and/or trademarks in the
U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must
respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of
this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Machine Learning in Action
100% (1)
Machine Learning in Action
1 page
Boosting Mit
No ratings yet
Boosting Mit
36 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
AI Chapter 3 Part 3
No ratings yet
AI Chapter 3 Part 3
49 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
ML Unit-3
No ratings yet
ML Unit-3
16 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
ML UNIT-3 PART-1
No ratings yet
ML UNIT-3 PART-1
17 pages
Unit 3
No ratings yet
Unit 3
99 pages
Supervised Learning: Overview 3: Rayid Ghani
No ratings yet
Supervised Learning: Overview 3: Rayid Ghani
20 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Ada Boost
No ratings yet
Ada Boost
25 pages
Artificial Intelligence Fundamentals: Learning: Boosting
No ratings yet
Artificial Intelligence Fundamentals: Learning: Boosting
24 pages
Ensemble_Learning_SA
No ratings yet
Ensemble_Learning_SA
27 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Trees, Boosting, and Random Forest
No ratings yet
Trees, Boosting, and Random Forest
14 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
CS229 Supplemental Lecture Notes: 1 Boosting
No ratings yet
CS229 Supplemental Lecture Notes: 1 Boosting
11 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
No ratings yet
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
55 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
DSA5102_lecture3
No ratings yet
DSA5102_lecture3
34 pages
Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop
No ratings yet
Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop
30 pages
Chapter 4 (2)
No ratings yet
Chapter 4 (2)
103 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
Introduction To Classification - PPT Slides 1
No ratings yet
Introduction To Classification - PPT Slides 1
62 pages
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
No ratings yet
Introduction To Boosting: Cynthia Rudin PACM, Princeton University
29 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Chapter 4
No ratings yet
Chapter 4
103 pages
Boosting
No ratings yet
Boosting
11 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
ShortCourse-QTT-Lecture2
No ratings yet
ShortCourse-QTT-Lecture2
37 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
Bagging Boosting Comparisons
No ratings yet
Bagging Boosting Comparisons
35 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
BDT KSETA Freudenstadt
No ratings yet
BDT KSETA Freudenstadt
32 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Hypothesis Testing Made Simple
From Everand
Hypothesis Testing Made Simple
Leonard Gaston
4/5 (5)
Precalculus: A Self-Teaching Guide
From Everand
Precalculus: A Self-Teaching Guide
Steve Slavin
4.5/5 (5)
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
ML Unit 3
No ratings yet
ML Unit 3
28 pages
Unit IV Aiml
No ratings yet
Unit IV Aiml
32 pages
Final Documentation (Major Project)
No ratings yet
Final Documentation (Major Project)
88 pages
Predicting Customer Churn On OTT Platforms
No ratings yet
Predicting Customer Churn On OTT Platforms
19 pages
Introduction To Machine Learning - Boosting
No ratings yet
Introduction To Machine Learning - Boosting
6 pages
AIML UNIT 4
No ratings yet
AIML UNIT 4
26 pages
Course Title Course Number
No ratings yet
Course Title Course Number
15 pages
1-s2.0-S2352012424021192-main
No ratings yet
1-s2.0-S2352012424021192-main
22 pages
Cyberbullying Detection On Twitter Using Machine Learning A Review
No ratings yet
Cyberbullying Detection On Twitter Using Machine Learning A Review
5 pages
Sse 11 24 549 2
No ratings yet
Sse 11 24 549 2
1 page
Machine Learning Advanced
100% (2)
Machine Learning Advanced
12 pages
Module 3 - Ensemble Learning
No ratings yet
Module 3 - Ensemble Learning
178 pages
Predictive Maintenance - Final Presentation (1)
No ratings yet
Predictive Maintenance - Final Presentation (1)
23 pages
Unit 4 Updated Notes
No ratings yet
Unit 4 Updated Notes
13 pages
Tarp Da 3
No ratings yet
Tarp Da 3
7 pages
Parkinsons Disease Prediction Using Machine Learning
No ratings yet
Parkinsons Disease Prediction Using Machine Learning
21 pages
EYE Blinking For Password Authentication
No ratings yet
EYE Blinking For Password Authentication
11 pages
Predicting Short-Term Stock Prices Using Ensemble Methods and Online Data Sources
No ratings yet
Predicting Short-Term Stock Prices Using Ensemble Methods and Online Data Sources
39 pages
When Machine Learning Meets Hardware Cybersecurity Delving Into Accurate Zero-Day Malware Detection
No ratings yet
When Machine Learning Meets Hardware Cybersecurity Delving Into Accurate Zero-Day Malware Detection
6 pages
The Critic As Artist
No ratings yet
The Critic As Artist
4 pages
Unit-5 Decision Trees and Ensemble Learning
100% (1)
Unit-5 Decision Trees and Ensemble Learning
162 pages
A Guide To Face Detection in Python - Towards Data Science
No ratings yet
A Guide To Face Detection in Python - Towards Data Science
26 pages
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
Experimenting XGBoost Algorithmfor Predictionand Classificationof Different Datasets
No ratings yet
Experimenting XGBoost Algorithmfor Predictionand Classificationof Different Datasets
12 pages
ENSEMBLE_LEARNING
No ratings yet
ENSEMBLE_LEARNING
9 pages
Trafic Sign Determination
No ratings yet
Trafic Sign Determination
22 pages
Classification
No ratings yet
Classification
58 pages

CSC 3304 Lecture 08 Boosting Ensemble Methods

Uploaded by

CSC 3304 Lecture 08 Boosting Ensemble Methods

Uploaded by

Click to edit Master

Cynthia Rudin | MIT Sloan School of Management

• Question of Kearns: Can you turn a “weak” learning algorithm

• Question of Kearns: Can you turn a “weak” learning algorithm

• We could ask the algorithm to create a lot of classifiers and

• Schapire and Freund’s answer:

• Now there are several boosting algorithm

• Viola and Jones is working on face detections

they came up with these very simple classifiers

• Subtract the white areas from the black ones

• Subtract the white areas from the black ones

Black and white areas

• Subtract the white areas from the black ones

Black and white areas

• Subtract the white areas from the black ones

• Subtract the white areas from the black ones

• Subtract the white areas from the black ones

• Used hundreds of thousands of these weak classifiers at all

Update weights: Zt is a normalization factor (all the

Now our weak classifiers are only

All points start with equal weights.

Run the weak learning algorithm to get a

Choose coefficient a 1 =.41

It classified these two points a positive. And

(Credit: Example adapted from Freund and Schapire)

Increase the weights on the misclassified

Because the weights on those points are so high

(Credit: Example adapted from Freund and Schapire)

(Credit: Example adapted from Freund and Schapire)

Run the weak learning algorithm to get a

Choose coefficient a 2 =.66

(Credit: Example adapted from Freund and Schapire)

Increase the weights on the misclassified

(Credit: Example adapted from Freund and Schapire)

Increase the weights on the misclassified

(Credit: Example adapted from Freund and Schapire)

Increase the weights on the misclassified

Choose coefficient a 3 =.93

(Credit: Example adapted from Freund and Schapire)

H=sign(.42 + .66 + .93 )

(Credit: Example adapted from Freund and Schapire)

Cynthia Rudin | MIT Sloan School of Management

• Complex and powerful prediction tool

• OthOptions: Other options, True if there are restaurants nearby.

Credit: Adapted from Russell and Norvig

30+ 5-15 0-5

A bootstrap sample of size n: Draw n points with replacement at

(So you have some repeated points, and that’s ok.)

Output all the trees.

• Let us measure the “importance” of variable j.

• Let us measure the “importance” of variable j.

• Let us measure the “importance” of variable j.

• Let us measure the “importance” of variable j.

Output all the trees.

You might also like