0% found this document useful (0 votes)

51 views17 pages

Ensemble Learning: David Sontag New York University

Ensemble Methods are methods that combine together many model predictions. For example, in Bagging (short for bootstrap aggregation), parallel models are constructed on m = many bootstrapped samples (eg., 50), and then the predictions from the m models are averaged to obtain the prediction from the ensemble of models. In this tutorial we walk through basics of three Ensemble Methods: Bagging, Random Forests, and Boosting.

Uploaded by

isaias.prestes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views17 pages

Ensemble Learning: David Sontag New York University

Uploaded by

isaias.prestes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Ensemble learning

Lecture 13

David Sontag
New York University

Slides adapted from Navneet Goyal, Tan, Steinbach,

Kumar, Vibhav Gogate
Ensemble methods
Machine learning competition with a $1 million prize
Bias/Variance Tradeoff

Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001

Reduce Variance Without
Increasing Bias
•  Averaging reduces variance:
(when predictions
are independent)

Average models to reduce model variance

One problem:
only one training set
where do multiple models come from?

4
Bagging: Bootstrap Aggregation
•  Leo Breiman (1994)
•  Take repeated bootstrap samples from training set
D
•  Bootstrap sampling: Given set D containing N
training examples, create D’ by drawing N
examples at random with replacement from D.

•  Bagging:
–  Create k bootstrap samples D1 … Dk.
–  Train distinct classifier on each Di.
–  Classify new instance by majority vote / average.
General Idea
Bagging
•  Sampling with replacement
Training Data
Data ID

•  Build classifier on each bootstrap sample

•  Each data point has probability (1 – 1/n)n
of being selected as test data
•  Training data = 1- (1 – 1/n)n of the original
data
The 0.632 bootstrap
•  This method is also called the 0.632 bootstrap
–  A particular training data has a probability of
1-1/n of not being picked
–  Thus its probability of ending up in the test
data (not selected) is:

–  This means the training data will contain

approximately 63.2% of the instances
9
decision tree learning algorithm; very similar to ID3

10
shades of blue/red indicate strength of vote for particular classification
Example of Bagging

Assume that the training data is:

+1 0.4 to 0.7: -1 +1

0.8 x
0.3

Goal: find a collection of 10 simple thresholding classifiers that

collectively can classify correctly.
- Each simple (or weak) classifier is:
(x<=K  class = +1 or -1 depending on
which value yields the lowest error; where K
is determined by entropy minimization)
Random Forests
•  Ensemble method specifically designed for
decision tree classifiers

•  Introduce two sources of randomness:

“Bagging” and “Random input vectors”
–  Bagging method: each tree is grown using a bootstrap
sample of training data
–  Random vector method: At each node, best split is
chosen from a random sample of m attributes instead
of all attributes
Random Forests
Methods for Growing the Trees
•  Fix a m <= M. At each node
–  Method 1:
•  Choose m attributes randomly, compute their information
gains, and choose the attribute with the largest gain to split
–  Method 2:
•  (When M is not very large): select L of the attributes
randomly. Compute a linear combination of the L attributes
using weights generated from [-1,+1] randomly. That is, new
A = Sum(Wi*Ai), i=1..L.
–  Method 3:
•  Compute the information gain of all M attributes. Select the
top m attributes by information gain. Randomly select one of
the m attributes as the splitting node.
Random Forest Algorithm:
method 1 in previous slide

16
Reduce Bias2 and Decrease
Variance?
•  Bagging reduces variance by averaging
•  Bagging has little effect on bias
•  Can we average and reduce bias?
•  Yes:

•  Boosting

Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Chapter07 Ensemble Learning
No ratings yet
Chapter07 Ensemble Learning
21 pages
Module 5,1 Ensemble - Bagging, RF, Boosting
No ratings yet
Module 5,1 Ensemble - Bagging, RF, Boosting
66 pages
Bagging
No ratings yet
Bagging
7 pages
Unit I ML (I) 24-25-1
No ratings yet
Unit I ML (I) 24-25-1
152 pages
UNIT-5 ML Notes
No ratings yet
UNIT-5 ML Notes
24 pages
Unit I ML (I) 24-25
No ratings yet
Unit I ML (I) 24-25
79 pages
Ensemble Learning-1
No ratings yet
Ensemble Learning-1
61 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Ensemble Models
No ratings yet
Ensemble Models
52 pages
Main EL CM2end 2023
No ratings yet
Main EL CM2end 2023
33 pages
Random Forest Ai
No ratings yet
Random Forest Ai
16 pages
Bagging and Random Forests
No ratings yet
Bagging and Random Forests
24 pages
Module 2
No ratings yet
Module 2
34 pages
ML - 5
No ratings yet
ML - 5
53 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
ML Unit-3 Part-1
No ratings yet
ML Unit-3 Part-1
17 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
22AIP3101A Session 11
No ratings yet
22AIP3101A Session 11
30 pages
VTU Module-4 Chapter-2 Ensemble Learning and Random Forests
No ratings yet
VTU Module-4 Chapter-2 Ensemble Learning and Random Forests
61 pages
Lecture 17 - Ensemble Learning
No ratings yet
Lecture 17 - Ensemble Learning
31 pages
Evolutionary Bagging For Ensemble Learning: Keywords
No ratings yet
Evolutionary Bagging For Ensemble Learning: Keywords
16 pages
Lecture 22: Bagging and Random Forest: Wenbin Lu Department of Statistics North Carolina State University Fall 2019
No ratings yet
Lecture 22: Bagging and Random Forest: Wenbin Lu Department of Statistics North Carolina State University Fall 2019
35 pages
Unit 3
No ratings yet
Unit 3
59 pages
Data Science - Decision Tree - Random Forest
No ratings yet
Data Science - Decision Tree - Random Forest
15 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Ensemble Methods
No ratings yet
Ensemble Methods
32 pages
Ensemble Learning
No ratings yet
Ensemble Learning
16 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Lecture 9
No ratings yet
Lecture 9
12 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Ensemble
No ratings yet
Ensemble
2 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Unit 3
No ratings yet
Unit 3
63 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Session 7 - Random Forest
No ratings yet
Session 7 - Random Forest
8 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
Bagging and Random Forest Presentation1
100% (3)
Bagging and Random Forest Presentation1
23 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Random Forest
No ratings yet
Random Forest
29 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
737 Flight Management Computer Ops Manual: SECTION 1 - General Information
100% (3)
737 Flight Management Computer Ops Manual: SECTION 1 - General Information
52 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Ai Project (Ak)
No ratings yet
Ai Project (Ak)
51 pages
Random Forest
No ratings yet
Random Forest
10 pages
Paper 3
No ratings yet
Paper 3
32 pages
The Most Profitable One-Person Business Models in 2025. - by Michael Lim - Practice in Public - Mar, 2025 - Medium
No ratings yet
The Most Profitable One-Person Business Models in 2025. - by Michael Lim - Practice in Public - Mar, 2025 - Medium
18 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Project Viva Notes
No ratings yet
Project Viva Notes
23 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
X-Force Threat Intelligence Level 2 Quiz - Attempt Review
No ratings yet
X-Force Threat Intelligence Level 2 Quiz - Attempt Review
16 pages
CSLeichter 15.09.2017 NISLabC1 As Intended
No ratings yet
CSLeichter 15.09.2017 NISLabC1 As Intended
49 pages
Measuring Social Media Influencer Index
No ratings yet
Measuring Social Media Influencer Index
16 pages
Unit 5
No ratings yet
Unit 5
9 pages
6 C3 M4 L1-RecurrentNeuralNetwork1
No ratings yet
6 C3 M4 L1-RecurrentNeuralNetwork1
29 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Microbial Biotechnology - 2025 - Yu - Microbial Technologies Enhanced by Artificial Intelligence For H
No ratings yet
Microbial Biotechnology - 2025 - Yu - Microbial Technologies Enhanced by Artificial Intelligence For H
9 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
博士论文mit
100% (1)
博士论文mit
9 pages
IMRAN Defence Presentation
No ratings yet
IMRAN Defence Presentation
20 pages
Macro and Micro AI For Your Product - AI Product Manager's Handbook - Second Edition
No ratings yet
Macro and Micro AI For Your Product - AI Product Manager's Handbook - Second Edition
19 pages
5 6181368158478143949
No ratings yet
5 6181368158478143949
15 pages
cs229 Notes Ensemble
No ratings yet
cs229 Notes Ensemble
7 pages
Deep Reinforcement Learning For Stock Prediction
No ratings yet
Deep Reinforcement Learning For Stock Prediction
9 pages
Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Cluster Analysis: Basic Concepts and Algorithms
141 pages
BH - Interview Questions Kit
No ratings yet
BH - Interview Questions Kit
10 pages
Beginner's Guide To Welding 101 - All You Need To Know PDF
No ratings yet
Beginner's Guide To Welding 101 - All You Need To Know PDF
60 pages
DIP Report
No ratings yet
DIP Report
16 pages
Personality Prediction Using Social Media
No ratings yet
Personality Prediction Using Social Media
13 pages
Your Modern Business Guide To Data Analysis Methods and Techniques by Bernardita Calzon (2021)
No ratings yet
Your Modern Business Guide To Data Analysis Methods and Techniques by Bernardita Calzon (2021)
21 pages
Salesforce AI Associate Questions
No ratings yet
Salesforce AI Associate Questions
4 pages
Bitbucket Branch Management - A Comprehensive Guide - by Goutam Das
No ratings yet
Bitbucket Branch Management - A Comprehensive Guide - by Goutam Das
12 pages
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
No ratings yet
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
53 pages
Bayesian Statistical Concepts Bielefeld2
No ratings yet
Bayesian Statistical Concepts Bielefeld2
35 pages
09 Bayesian FIL2011May
No ratings yet
09 Bayesian FIL2011May
26 pages
Bagging and Boosting: Amit Srinet Dave Snyder
No ratings yet
Bagging and Boosting: Amit Srinet Dave Snyder
33 pages
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
No ratings yet
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
35 pages
Online Mini-Lecture:: DR Andy Evans
No ratings yet
Online Mini-Lecture:: DR Andy Evans
10 pages
Ensemble Methods
No ratings yet
Ensemble Methods
15 pages
Ensemble Methods
No ratings yet
Ensemble Methods
15 pages
Gacnn - Training Deep Convolutional Neural Networks With Genetic Algorithm
No ratings yet
Gacnn - Training Deep Convolutional Neural Networks With Genetic Algorithm
4 pages
Paper 233-30 An Introduction To SAS® Character Functions (Including Some New SAS®9 Functions) Ronald Cody, Ed.D
No ratings yet
Paper 233-30 An Introduction To SAS® Character Functions (Including Some New SAS®9 Functions) Ronald Cody, Ed.D
15 pages
Prabhat Tiwari: Education Skills
No ratings yet
Prabhat Tiwari: Education Skills
1 page
STICK WELDING 101 - Getting Started With SMAW
No ratings yet
STICK WELDING 101 - Getting Started With SMAW
5 pages
UmaMahesh Chakali
No ratings yet
UmaMahesh Chakali
2 pages
6 Welding Tips and Tricks - How To Weld The Right Way
No ratings yet
6 Welding Tips and Tricks - How To Weld The Right Way
12 pages
1Z0 1122 24 Demo
No ratings yet
1Z0 1122 24 Demo
6 pages
Prediction of Online Sales Using Linear Regression: Abstract
No ratings yet
Prediction of Online Sales Using Linear Regression: Abstract
3 pages
Software - AvaxHome
No ratings yet
Software - AvaxHome
11 pages
Welding 101 For Hobbyists (And Nerds!) - Practical Engineering
No ratings yet
Welding 101 For Hobbyists (And Nerds!) - Practical Engineering
6 pages
Medical Advice Tutorial
No ratings yet
Medical Advice Tutorial
4 pages
A White Paper On The Future of Artificial Intelligence
No ratings yet
A White Paper On The Future of Artificial Intelligence
6 pages
Crop Yield Prediction Using ML Algorithms
No ratings yet
Crop Yield Prediction Using ML Algorithms
8 pages
How To Weld - MIG Welding - 11 Steps (With Pictures) - Instructables
No ratings yet
How To Weld - MIG Welding - 11 Steps (With Pictures) - Instructables
19 pages
Statics: Content Summary
No ratings yet
Statics: Content Summary
4 pages
Read First
No ratings yet
Read First
3 pages
Boosting and AdaBoost For Machine Learning
No ratings yet
Boosting and AdaBoost For Machine Learning
18 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
ML Unit-4
No ratings yet
ML Unit-4
9 pages
Nome Autor Website Boeing 787 Mcdonnell-Douglas D
No ratings yet
Nome Autor Website Boeing 787 Mcdonnell-Douglas D
9 pages

Ensemble Learning: David Sontag New York University

Uploaded by

Ensemble Learning: David Sontag New York University

Uploaded by

Ensemble learning

Slides adapted from Navneet Goyal, Tan, Steinbach,

Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001

Average models to reduce model variance

• Build classifier on each bootstrap sample

– This means the training data will contain

Assume that the training data is:

Goal: find a collection of 10 simple thresholding classifiers that

• Introduce two sources of randomness:

You might also like

•  Build classifier on each bootstrap sample

–  This means the training data will contain

•  Introduce two sources of randomness: