0% found this document useful (0 votes)

813 views10 pages

Cornell CS578: Bagging and Boosting

The document discusses the bias-variance tradeoff in machine learning models. It explains that model loss can be decomposed into noise, bias, and variance. Models can exhibit either high bias (underfitting) or high variance (overfitting). Bagging and boosting are ensemble methods that aim to reduce variance without increasing bias. Bagging averages predictions from models trained on bootstrap samples of a dataset, while boosting iteratively reweights training examples to focus on those misclassified by previous models. Both methods can improve performance over a single model.

Uploaded by

Ian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

813 views10 pages

Cornell CS578: Bagging and Boosting

Uploaded by

Ian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Bias/Variance Tradeoff

Model Loss (Error) Bias/Variance Decomposition

• Squared loss of model on test case i:

2 (L( x,D) − T (x))2 = Noise 2 + Bias 2 + Variance
( Learn(x i ,D) − Truth( xi )) D

Noise 2 = lower bound on performance

• Expected prediction error: 2
Bias 2 = (expected error due to model mismatch)
2
( Learn( x, D) − Truth( x)) D
Variance = variation due to train sample and randomization

1
Bias2 Variance

• Low bias • Low variance

– linear regression applied to linear data – constant function
– 2nd degree polynomial applied to quadratic data – model independent of training data
– ANN with many hidden units trained to completion – model depends on stable measures of data
• High bias • mean
• median
– constant function
– linear regression applied to non-linear data
• High variance
– ANN with few hidden units applied to non-linear data – high degree polynomial
– ANN with many hidden units trained to completion

Sources of Variance in Supervised Learning Bias/Variance Tradeoff

• noise in targets or input attributes • (bias2+variance) is what counts for prediction

• bias (model mismatch) • Often:
• training sample – low bias => high variance
• randomness in learning algorithm – low variance => high bias
– neural net weight initialization • Tradeoff:
• randomized subsetting of train set: – bias2 vs. variance
– cross validation, train and early stopping set

2
Bias/Variance Tradeoff Bias/Variance Tradeoff

Duda, Hart, Stork “Pattern Classification”, 2nd edition, 2001 Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001

Reduce Variance Without Increasing Bias Bagging: Bootstrap Aggregation

• Averaging reduces variance: • Leo Breiman (1994)

• Bootstrap Sample:
Var(X )
Var(X ) = – draw sample of size |D| with replacement from D
N
Train Li (BootstrapSamplei ( D) )
• Average models to reduce model variance
• One problem: Regression : Lbagging = Li
– only one train set Classification : Lbagging = Plurality (Li )
– where do multiple models come from?

3
Bagging Bagging Results

• Best case:

Variance(L( x,D))
Var(Bagging( L(x, D))) =
N

• In practice:
– models are correlated, so reduction is smaller than 1/N
– variance of models trained on fewer training cases
usually somewhat larger
– stable learning methods have low variance to begin
with, so bagging may not help much
Breiman “Bagging Predictors” Berkeley Statistics Department TR#421, 1994

How Many Bootstrap Samples? More bagging results

Breiman “Bagging Predictors” Berkeley Statistics Department TR#421, 1994

4
More bagging results Bagging with cross validation

• Train neural networks using 4-fold CV

– Train on 3 folds earlystop on the fourth
– At the end you have 4 neural nets

• How to make predictions on new examples?

Bagging with cross validation Can Bagging Hurt?

• Train neural networks using 4-fold CV

– Train on 3 folds earlystop on the fourth
– At the end you have 4 neural nets

• How to make predictions on new examples?

– Train a neural network until the mean earlystopping
point
– Average the predictions from the four neural networks

5
Can Bagging Hurt? Reduce Bias2 and Decrease Variance?

• Bagging reduces variance by averaging

• Each base classifier is trained on less data • Bagging has little effect on bias
– Only about 63.2% of the data points are in any • Can we average and reduce bias?
bootstrap sample
• Yes:

• However the final model has seen all the data

– On average a point will be in >50% of the bootstrap Boosting
samples

Boosting Boosting

• Freund & Schapire: • Weight all training samples equally

– theory for “weak learners” in late 80’s • Train model on train set
• Weak Learner: performance on any train set is • Compute error of model on train set
slightly better than chance prediction • Increase weights on train cases model gets wrong!
• intended to answer a theoretical question, not as a • Train new model on re-weighted train set
practical way to improve learning • Re-compute errors on weighted train set
• tested in mid 90’s using not-so-weak learners • Increase weights again on cases model gets wrong
• works anyway! • Repeat until tired (100+ iterations)
• Final model: weighted prediction of each model

6
Boosting Boosting: Initialization

Initialization

Iteration

Final Model

Boosting: Iteration Boosting: Prediction

7
Weight updates Reweighting vs Resampling

• Weights for incorrect instances are multiplied by • Example weights might be harder to deal with
1/(2Error_i) – Some learning methods can’t use weights on examples
– Small train set errors cause weights to grow by several – Many common packages don’t support weighs on the
orders of magnitude train
• We can resample instead:
• Total weight of misclassified examples is 0.5 – Draw a bootstrap sample from the data with the
probability of drawing each example is proportional to
it’s weight
• Total weight of correctly classified examples is • Reweighting usually works better but resampling
0.5 is easier to implement

Boosting Performance Boosting vs. Bagging

• Bagging doesn’t work so well with stable models.

Boosting might still help.

• Boosting might hurt performance on noisy

datasets. Bagging doesn’t have this problem

• In practice bagging almost always helps.

8
Boosting vs. Bagging
Bagged Decision Trees
 Draw 100 bootstrap samples of data
• On average, boosting helps more than bagging,  Train trees on each sample -> 100 trees
but it is also more common for boosting to hurt  Un-weighted average prediction of trees
performance.

…
• The weights grow exponentially. Code must be
written carefully (store log of weights, …)

• Bagging is easier to parallelize. Average prediction

(0.23 + 0.19 + 0.34 + 0.22 + 0.26 + … + 0.31) / # Trees = 0.24

 Marriage made in heaven. Highly under-rated!

Random Forests (Bagged Trees++)

Model Averaging
 Draw 1000+ bootstrap samples of data
 Draw sample of available attributes at each split • Almost always helps
 Train trees on each sample/attribute set -> 1000+ trees • Often easy to do
 Un-weighted average prediction of trees • Models shouldn’t be too similar
• Models should all have pretty good performance
(not too many lemons)
… • When averaging, favor low bias, high variance
• Models can individually overfit
• Not just in ML
Average prediction
(0.23 + 0.19 + 0.34 + 0.22 + 0.26 + … + 0.31) / # Trees = 0.24

9
Out of Bag Samples

• With bagging, each model trained on about 63%

of training sample
• That means each model does not use 37% of data
• Treat these as test points!
– Backfitting in trees
– Pseudo cross validation
– Early stopping sets

Regression Modeling Strategies
No ratings yet
Regression Modeling Strategies
506 pages
Machine Learning: An Applied Econometric Approach: Sendhil Mullainathan and Jann Spiess
No ratings yet
Machine Learning: An Applied Econometric Approach: Sendhil Mullainathan and Jann Spiess
38 pages
Ensemble Learning: David Sontag New York University
No ratings yet
Ensemble Learning: David Sontag New York University
17 pages
Artificial Intelligence and Algorithms in Intelligent Systems (Radek Silhavy) (Z-Library)
No ratings yet
Artificial Intelligence and Algorithms in Intelligent Systems (Radek Silhavy) (Z-Library)
515 pages
Lecture On Bootstrap - Lecture Notes
No ratings yet
Lecture On Bootstrap - Lecture Notes
29 pages
Te 1555
No ratings yet
Te 1555
134 pages
Car PDF
No ratings yet
Car PDF
156 pages
Ratio Regression R
No ratings yet
Ratio Regression R
20 pages
Foundations of Computer Science - Solutions To Selected Exercise
No ratings yet
Foundations of Computer Science - Solutions To Selected Exercise
89 pages
Review-Validation of QSAR Models-Strategies and Importance
No ratings yet
Review-Validation of QSAR Models-Strategies and Importance
9 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Noacco Et Al - EA - v1 PDF
No ratings yet
Noacco Et Al - EA - v1 PDF
27 pages
I. Glymour'S Bootstrapping Theory of Confirmation
No ratings yet
I. Glymour'S Bootstrapping Theory of Confirmation
26 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
51 pages
Short Answer Type Questions: Question Bank
No ratings yet
Short Answer Type Questions: Question Bank
26 pages
Package rDEA': R Topics Documented
No ratings yet
Package rDEA': R Topics Documented
16 pages
STA 301 - Introduction To Data Science (04975) - Fall 2021: Course Overview
No ratings yet
STA 301 - Introduction To Data Science (04975) - Fall 2021: Course Overview
12 pages
Eilander, Dkk. 2020 PDF
No ratings yet
Eilander, Dkk. 2020 PDF
13 pages
Jackknife Bias Estimator, Standard Error and Pseudo-Value
No ratings yet
Jackknife Bias Estimator, Standard Error and Pseudo-Value
14 pages
Aftab Et Al. 2017 Preprint of Schemas Paper
No ratings yet
Aftab Et Al. 2017 Preprint of Schemas Paper
31 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Map-Reduce For Machine Learning On Multicore PDF
No ratings yet
Map-Reduce For Machine Learning On Multicore PDF
8 pages
4227 GUI Ebook Data Science Interview Guide
No ratings yet
4227 GUI Ebook Data Science Interview Guide
25 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Chubby Lock Service Over Distributed Network
No ratings yet
Chubby Lock Service Over Distributed Network
16 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
Lang WU - Jin QIU - Applied Multivariate Statistical Analysis and Related Topics With R-EDP Sciences (2021)
No ratings yet
Lang WU - Jin QIU - Applied Multivariate Statistical Analysis and Related Topics With R-EDP Sciences (2021)
236 pages
Unit 6-Feature Engineering and Sensitivity Analysis
No ratings yet
Unit 6-Feature Engineering and Sensitivity Analysis
63 pages
28-Xiu Constellations (Nakamura) Unproofed
No ratings yet
28-Xiu Constellations (Nakamura) Unproofed
11 pages
Population Pharmacokinetics III
No ratings yet
Population Pharmacokinetics III
9 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
16-Ensemble Learning - Cont... - 12-04-2024
No ratings yet
16-Ensemble Learning - Cont... - 12-04-2024
13 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Bagging & Boosting
No ratings yet
Bagging & Boosting
10 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Ensemble Learning
No ratings yet
Ensemble Learning
30 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Bagging Vs Boosting in Machine Learning
No ratings yet
Bagging Vs Boosting in Machine Learning
4 pages
Comparing and Evaluating Epoll, Select and Poll Event Mechanisms
No ratings yet
Comparing and Evaluating Epoll, Select and Poll Event Mechanisms
22 pages
Cornell CS578: Clustering
No ratings yet
Cornell CS578: Clustering
16 pages
Chapter Five
No ratings yet
Chapter Five
42 pages
IAAO Glossary 3rd Ed Final
No ratings yet
IAAO Glossary 3rd Ed Final
155 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Ensemble Learning-Bagging-Boosting-Stacking
No ratings yet
Ensemble Learning-Bagging-Boosting-Stacking
12 pages
Cornell CS578: Introduction
No ratings yet
Cornell CS578: Introduction
11 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
Boosting
No ratings yet
Boosting
6 pages
Unit 3
No ratings yet
Unit 3
99 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
cs229 Notes Ensemble
No ratings yet
cs229 Notes Ensemble
7 pages
Overfitting & Feature Engineering
No ratings yet
Overfitting & Feature Engineering
37 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Unit 3
No ratings yet
Unit 3
59 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Cornell CS578: Hypothesis Testing
No ratings yet
Cornell CS578: Hypothesis Testing
2 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Unit 3
No ratings yet
Unit 3
63 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Ensemble Methods
No ratings yet
Ensemble Methods
21 pages
Lecture3 Bias and Variance Analysis and Bagging
No ratings yet
Lecture3 Bias and Variance Analysis and Bagging
22 pages
ML Unit-3 Part-1
No ratings yet
ML Unit-3 Part-1
17 pages
Data Analysis and Graphics Using R An Example Based Approach Third Edition John Maindonald
100% (1)
Data Analysis and Graphics Using R An Example Based Approach Third Edition John Maindonald
59 pages
UMl - Unit 3
No ratings yet
UMl - Unit 3
50 pages
Module 5,1 Ensemble - Bagging, RF, Boosting
No ratings yet
Module 5,1 Ensemble - Bagging, RF, Boosting
66 pages
Jan24w01 Brief 1
No ratings yet
Jan24w01 Brief 1
31 pages
1 s2.0 S0031320303003327 Main
No ratings yet
1 s2.0 S0031320303003327 Main
15 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
UNIT3 Class
No ratings yet
UNIT3 Class
30 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Bagging Vs Boosting in Machine Learning - GeeksforGeeks
No ratings yet
Bagging Vs Boosting in Machine Learning - GeeksforGeeks
9 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Bagging Vs Boosting in Machine Learning
No ratings yet
Bagging Vs Boosting in Machine Learning
5 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Duong Hazelton 2003 Jns
No ratings yet
Duong Hazelton 2003 Jns
15 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
15 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Computer Age Statistical Inference Algorithms Evidence and Data Science 1st Edition Bradley Efron PDF Download
No ratings yet
Computer Age Statistical Inference Algorithms Evidence and Data Science 1st Edition Bradley Efron PDF Download
57 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
ML UNIT 4 Notes
No ratings yet
ML UNIT 4 Notes
30 pages
ML Chapter 3
No ratings yet
ML Chapter 3
25 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet

Cornell CS578: Bagging and Boosting

Uploaded by

Cornell CS578: Bagging and Boosting

Uploaded by

Bias/Variance Tradeoff

Model Loss (Error) Bias/Variance Decomposition

• Squared loss of model on test case i:

Noise 2 = lower bound on performance

• Low bias • Low variance

Sources of Variance in Supervised Learning Bias/Variance Tradeoff

• noise in targets or input attributes • (bias2+variance) is what counts for prediction

Reduce Variance Without Increasing Bias Bagging: Bootstrap Aggregation

• Averaging reduces variance: • Leo Breiman (1994)

How Many Bootstrap Samples? More bagging results

Breiman “Bagging Predictors” Berkeley Statistics Department TR#421, 1994

• Train neural networks using 4-fold CV

• How to make predictions on new examples?

Bagging with cross validation Can Bagging Hurt?

• Train neural networks using 4-fold CV

• How to make predictions on new examples?

• Bagging reduces variance by averaging

• However the final model has seen all the data

• Freund & Schapire: • Weight all training samples equally

Boosting: Iteration Boosting: Prediction

Boosting Performance Boosting vs. Bagging

• Bagging doesn’t work so well with stable models.

• Boosting might hurt performance on noisy

• In practice bagging almost always helps.

• Bagging is easier to parallelize. Average prediction

 Marriage made in heaven. Highly under-rated!

Random Forests (Bagged Trees++)

• With bagging, each model trained on about 63%

You might also like