0% found this document useful (0 votes)
51 views17 pages

Ensemble Learning: David Sontag New York University

Ensemble Methods are methods that combine together many model predictions. For example, in Bagging (short for bootstrap aggregation), parallel models are constructed on m = many bootstrapped samples (eg., 50), and then the predictions from the m models are averaged to obtain the prediction from the ensemble of models. In this tutorial we walk through basics of three Ensemble Methods: Bagging, Random Forests, and Boosting.

Uploaded by

isaias.prestes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views17 pages

Ensemble Learning: David Sontag New York University

Ensemble Methods are methods that combine together many model predictions. For example, in Bagging (short for bootstrap aggregation), parallel models are constructed on m = many bootstrapped samples (eg., 50), and then the predictions from the m models are averaged to obtain the prediction from the ensemble of models. In this tutorial we walk through basics of three Ensemble Methods: Bagging, Random Forests, and Boosting.

Uploaded by

isaias.prestes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Ensemble learning

Lecture 13

David Sontag
New York University

Slides adapted from Navneet Goyal, Tan, Steinbach,


Kumar, Vibhav Gogate
Ensemble methods
Machine learning competition with a $1 million prize
Bias/Variance Tradeoff

Hastie, Tibshirani, Friedman “Elements of Statistical Learning” 2001



Reduce Variance Without
Increasing Bias
•  Averaging reduces variance:
(when predictions
are independent)

Average models to reduce model variance


One problem:
only one training set
where do multiple models come from?

4
Bagging: Bootstrap Aggregation
•  Leo Breiman (1994)
•  Take repeated bootstrap samples from training set
D
•  Bootstrap sampling: Given set D containing N
training examples, create D’ by drawing N
examples at random with replacement from D.

•  Bagging:
–  Create k bootstrap samples D1 … Dk.
–  Train distinct classifier on each Di.
–  Classify new instance by majority vote / average.
General Idea
Bagging
•  Sampling with replacement
Training Data
Data ID

•  Build classifier on each bootstrap sample


•  Each data point has probability (1 – 1/n)n
of being selected as test data
•  Training data = 1- (1 – 1/n)n of the original
data
The 0.632 bootstrap
•  This method is also called the 0.632 bootstrap
–  A particular training data has a probability of
1-1/n of not being picked
–  Thus its probability of ending up in the test
data (not selected) is:

–  This means the training data will contain


approximately 63.2% of the instances
9
decision tree learning algorithm; very similar to ID3

10
shades of blue/red indicate strength of vote for particular classification
Example of Bagging

Assume that the training data is:

+1 0.4 to 0.7: -1 +1

0.8 x
0.3

Goal: find a collection of 10 simple thresholding classifiers that


collectively can classify correctly.
- Each simple (or weak) classifier is:
(x<=K  class = +1 or -1 depending on
which value yields the lowest error; where K
is determined by entropy minimization)
Random Forests
•  Ensemble method specifically designed for
decision tree classifiers

•  Introduce two sources of randomness:


“Bagging” and “Random input vectors”
–  Bagging method: each tree is grown using a bootstrap
sample of training data
–  Random vector method: At each node, best split is
chosen from a random sample of m attributes instead
of all attributes
Random Forests
Methods for Growing the Trees
•  Fix a m <= M. At each node
–  Method 1:
•  Choose m attributes randomly, compute their information
gains, and choose the attribute with the largest gain to split
–  Method 2:
•  (When M is not very large): select L of the attributes
randomly. Compute a linear combination of the L attributes
using weights generated from [-1,+1] randomly. That is, new
A = Sum(Wi*Ai), i=1..L.
–  Method 3:
•  Compute the information gain of all M attributes. Select the
top m attributes by information gain. Randomly select one of
the m attributes as the splitting node.
Random Forest Algorithm:
method 1 in previous slide

16
Reduce Bias2 and Decrease
Variance?
•  Bagging reduces variance by averaging
•  Bagging has little effect on bias
•  Can we average and reduce bias?
•  Yes:

•  Boosting

You might also like