100% found this document useful (1 vote)

188 views48 pages

Bagging+Boosting+Gradient Boosting

Ensemble methods like bagging and boosting combine multiple machine learning models to improve predictive performance. Bagging trains models on bootstrap samples of the original data and combines predictions by averaging. Boosting iteratively trains models while changing the weights of misclassified examples to focus on harder cases. Both aim to reduce variance by averaging models, but boosting also reduces bias by reweighting examples. Ensemble methods perform better when base models are unstable, like decision trees, and bagging usually improves performance more than boosting but with higher risk of overfitting.

Uploaded by

Parimal Shivendu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

188 views48 pages

Bagging+Boosting+Gradient Boosting

Uploaded by

Parimal Shivendu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Ensemble Methods

Bagging+Boosting+Gradient
Boosting

Pulak Ghosh
IIMB
Introduction & Motivation
Suppose that you are a patient with a set of symptoms
Instead of taking opinion of just one doctor (classifier), you decide to take
opinion of a few doctors!
Is this a good idea? Indeed it is.
Consult many doctors and then based on their diagnosis; you can get a fairly
accurate idea of the diagnosis.
Majority voting - bagging
More weightage to the opinion of some good (accurate) doctors -
boosting
In bagging, you give equal weightage to all classifiers, whereas in boosting
you give weightage according to the accuracy of the classifier.
Ensemble Methods

Construct a set of classifiers from the training data

Predict class label of previously unseen records by

aggregating predictions made by multiple classifiers
General Idea
Ensemble Classifiers (EC)

An ensemble classifier constructs a set of base classifiers

from the training data
Methods for constructing an EC
Manipulating training set
Manipulating input features
Manipulating class labels
Manipulating learning algorithms
Ensemble Classifiers (EC)

Manipulating training set

Multiple training sets are created by resampling the data
according to some sampling distribution
Sampling distribution determines how likely it is that an example
will be selected for training may vary from one trial to another
Classifier is built from each training set using a paritcular
learning algorithm
Examples: Bagging & Boosting
Ensemble Classifiers (EC)

Manipulating input features

Subset of input features chosen to form each training set
Subset can be chosen randomly or based on inputs given by
Domain Experts
Good for data that has redundant features
Random Forest is an example which uses DT as its base classifierss
Ensemble Classifiers (EC)

Manipulating class labels

When no. of classes is sufficiently large
Training data is transformed into a binary class problem by randomly
partitioning the class labels into 2 disjoint subsets, A0 & A1
Re-labelled examples are used to train a base classifier
By repeating the class labeling and model building steps several times, and
ensemble of base classifiers is obtained
How a new tuple is classified?
Example error correcting output codings (pp 307)
Ensemble Classifiers (EC)

Manipulating learning algorithm

Learning algorithms can be manipulated in such a way that applying
the algorithm several times on the same training data may result in
different models
Example ANN can produce different models by changing network
topology or the initial weights of links between neurons
Example ensemble of DTs can be constructed by introducing
randomness into the tree growing procedure instrad of choosing the
best split attribute at each node, we randomly choose one of the top k
attributes
Ensemble Classifiers (EC)

First 3 approaches are generic can be applied to any

classifier
Fourth approach depends on the type of classifier used
Base classifiers can be generated sequentially or in
parallel
General Idea
S Training
Data

Multiple Data
S1 S2 Sn
Sets

Multiple C1 C2 Cn
Classifiers

Combined
Classifier
H
Build Ensemble Classifiers
Basic idea:
Build different experts, and let them vote
Advantages:
Improve predictive performance
Other types of classifiers can be directly included
Easy to implement
No too much parameter tuning
Disadvantages:
The combined classifier is not so transparent (black box)
Not a compact representation
Why does it work?

Suppose there are 25 base classifiers

Each classifier has error rate, = 0.35
Assume classifiers are independent
Probability that the ensemble classifier makes a wrong prediction:

25
25 i

i
i 13
(1 ) 25 i
0.06

Examples of Ensemble Methods

How to generate an ensemble of classifiers?

Bagging
Boosting
Random Forests
Bagging
Introduced by Breiman (1996)

Bagging stands for bootstrap aggregating.

It is an ensemble method: a method of combining multiple

predictors.
Bagging algorithm

Let the original training data be L

Repeat B times:
Get a bootstrap sample Lk from L.
Train a predictor using Lk.

Combine B predictors by
Voting (for classification problem)
Averaging (for estimation problem)

Bagging -- the Idea
Bootstrap
estimators

1
2
B

Bootstrap
samples
X*2
1 X*B
X*

Original X = (x1, ..., xn)

sample

The final estimate: = (1 + 2 + ... + B )/B

Adaptive Bagging

X*B

2

2
B

X*

X*1

X = (x1, ..., xn)

Reduce both variance and bias

Bagging
Bagging works because it reduces variance by voting/averaging
o In some pathological hypothetical situations the overall error might increase
o Usually, the more classifiers the better
Problem: we only have one dataset.
Solution: generate new ones of size n by bootstrapping, i.e.
sampling it with replacement
Can help a lot if data is noisy.
When does Bagging work?
Learning algorithm is unstable: if small changes to the training
set cause large changes in the learned classifier.

If the learning algorithm is unstable, then Bagging almost

always improves performance

Some candidates:
Decision tree, decision stump, regression tree, linear
regression, SVMs
Bias-variance Decomposition
Used to analyze how much selection of any specific training set
affects performance

Assume infinitely many classifiers, built from different training

sets

For any learning scheme,

o Bias = expected error of the combined classifier on new data
o Variance = expected error due to the particular training set used

Total expected error ~ bias + variance

Why Bagging works?
Let S {( yi , xi ), i 1...N } be the set of training dataset

Let {S k } be a sequence of training sets containing a sub-set of

Let P be the underlying distribution of S .

Bagging replaces the prediction of the model with the majority

of the predictions given by the classifiers

A ( x, P ) ES ( ( x, S k ))
Why Bagging works?

A ( x, P ) ES [ ( x, S k )]
Direct error:
e ES EY , X [Y ( X , S )] 2

Bagging error:
e A EY , X [Y A ( X , P)] 2

Jensens inequality: E[ Z ]2 E[ Z 2 ]
e E[Y 2 ] 2 E[Y A ] EY , X ES [ 2 ( X , S )]

E (Y A ) 2 e A
Boosting

An iterative procedure to adaptively change distribution

of training data by focusing more on previously
misclassified records
Initially, all N records are assigned equal weights
Unlike bagging, weights may change at the end of a boosting
round
Overview of boosting
Introduced by Schapire and Freund in 1990s.

Boosting: convert a weak learning algorithm into a strong one.

Main idea: Combine many weak classifiers to produce a powerful

committee.

Algorithms:
AdaBoost: adaptive boosting
Gentle AdaBoost
BrownBoost

Bagging
le t ML
m p n
sa eme f1
dom plac
n
Ra th re
wi ML
f2 f

Ra
wi ndo ML
th m
re sa
pla m
ce ple
fT
m
en
t
Boosting
ML
Training Sample f1
ML
Weighted Sample f2
f

ML
Weighted Sample fT
What is Boosting?
Analogy: Consult several doctors, based on a combination of weighted diagnosesweight
assigned based on the previous diagnosis accuracy

How boosting works?

Weights are assigned to each training tuple
A series of k classifiers is iteratively learned
After a classifier Mi is learned, the weights are updated to allow the subsequent classifier, Mi+1, to pay more attention to the training tuples that
were misclassified by Mi
The final M* combines the votes of each individual classifier, where the weight of each classifier's vote is a function of its accuracy

The boosting algorithm can be extended for the prediction of continuous values

Comparing with bagging: boosting tends to achieve greater accuracy, but it also risks
overfitting the model to misclassified data
Basic Idea?

Suppose there are just 5 training examples {1,2,3,4,5}

Initially each example has a 0.2 (1/5) probability of being sampled

1st round of boosting samples (with replacement) 5 examples:

{2, 4, 4, 3, 2} and builds a classifier from them

Suppose examples 2, 3, 5 are correctly predicted by this classifier, and examples 1, 4 are wrongly predicted:

Weight of examples 1 and 4 is increased

Weight of examples 2, 3, 5 is decreased

2nd round of boosting samples again 5 examples, but now examples 1 and 4 are more likely to be sampled

And so on until some convergence is achieved

Boosting (Contd)
Adaboost - Adaptive
Boosting
Instead of resampling, uses training set re-weighting
Each training sample uses a weight to determine the probability of
being selected for a training set.

AdaBoost is an algorithm for constructing a strong classifier

as linear combination of simple weak classifier

Final classification based on weighted vote of weak classifiers

AdaBoost

B
Update distribution
XB
DB

D2 X2
Choose weight
t
t = 1/2ln(1- t / t)
X1

D1 Calculate error t
X = (x1, ..., xn)

Initialize Distribution
D1(i) = 1/n

The final estimate: = (11 + 22 + ... + n B )/B

Ada Boost.M1
The most popular boosting algorithm Fruend and Schapire
(1997)
Consider a two-class problem, output variable coded as Y {-
1,+1}
For a predictor variable X, a classifier G(X) produces
predictions that are in {-1,+1}
The error rate on the training sample is

1 N
err I( y G ( x ))
N
i i
i 1
Ada Boost.M1 (Contd)
Sequentially apply the weak classification to repeatedly
modified versions of data
produce a sequence of weak classifiers Gm(x) m=1,2,..,M
The predictions from all classifiers are combined via majority
vote to produce the final prediction
Adaboost Concept
Adaboost starts with a uniform
distribution of weights over training
examples. The weights tell the learning
algorithm the importance of the
example.

Obtain a weak classifier from the

weak learning algorithm, hj(x).

Increase the weights on the training

examples that were misclassified.

(Repeat)
At the end, carefully make a linear
combination of the weak classifiers
obtained at all iterations.

f final (x) final,1h1 (x) final,n hn (x)

A toy example(contd)

Final Classifier: integrate the three weak classifiers and obtain a final strong
classifier.
Bagging vs Boosting
Bagging: the construction of complementary base-learners is left
to chance and to the unstability of the learning methods.
Boosting: actively seek to generate complementary base-
learner--- training the next base-learner based on the mistakes of
the previous learners.

Random Forest PDF
No ratings yet
Random Forest PDF
92 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
Module-2 - Logistic Regression in Machine Learning
No ratings yet
Module-2 - Logistic Regression in Machine Learning
28 pages
ML Unit Wise Important Questions
No ratings yet
ML Unit Wise Important Questions
2 pages
Applsci 12 00828
No ratings yet
Applsci 12 00828
18 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
BRL Hardy
100% (5)
BRL Hardy
8 pages
Sajjadiani Et Al - 2019 - Using Machine Learning To Translate Applicant Work History Into Predictors of
No ratings yet
Sajjadiani Et Al - 2019 - Using Machine Learning To Translate Applicant Work History Into Predictors of
61 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Itec 55a Platform Technologies
100% (1)
Itec 55a Platform Technologies
3 pages
Code ExerciseModelSelection
100% (1)
Code ExerciseModelSelection
19 pages
FinStats 2019 User Guide and Benchmarking Methodology 1
100% (1)
FinStats 2019 User Guide and Benchmarking Methodology 1
56 pages
Test-7 Result-18.02.2024
No ratings yet
Test-7 Result-18.02.2024
62 pages
Alzheimers Disease Detection Using Different Machine Learning Algorithms
100% (1)
Alzheimers Disease Detection Using Different Machine Learning Algorithms
7 pages
Human Life Span Prediction Using Machine Learning
100% (1)
Human Life Span Prediction Using Machine Learning
9 pages
Lecture Slides#7
No ratings yet
Lecture Slides#7
21 pages
ML Lecture 7 - Ensemble Learning
No ratings yet
ML Lecture 7 - Ensemble Learning
18 pages
Unsupe - Rvised Learning: Able T Understand and Prehend
No ratings yet
Unsupe - Rvised Learning: Able T Understand and Prehend
25 pages
Confidence Intervals: Submitted To: Prof. Neeta Gupta
100% (2)
Confidence Intervals: Submitted To: Prof. Neeta Gupta
13 pages
Ayush File 1
No ratings yet
Ayush File 1
37 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
A Comparison of Classification Techniques On Prediction of Student Performance
No ratings yet
A Comparison of Classification Techniques On Prediction of Student Performance
6 pages
Classification Model Evaluation Metrics: Željko Đ. Vujović
No ratings yet
Classification Model Evaluation Metrics: Željko Đ. Vujović
8 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Classification - KNN
No ratings yet
Classification - KNN
8 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Neural Network Based Rainfall Prediction System
100% (1)
Neural Network Based Rainfall Prediction System
6 pages
Student Academic Performance Prediction Under Various Machine Learning Classification Algorithms
No ratings yet
Student Academic Performance Prediction Under Various Machine Learning Classification Algorithms
19 pages
Bab 8 Clustering: Data Mining - Arif Djunaidy - FTIF ITS Bab 8 - 1/??
No ratings yet
Bab 8 Clustering: Data Mining - Arif Djunaidy - FTIF ITS Bab 8 - 1/??
119 pages
11-2 - Enrichment PDF
No ratings yet
11-2 - Enrichment PDF
1 page
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Rainfall Prediction Using Machine Learning Algorithms A Comparative Analysis Approach
100% (1)
Rainfall Prediction Using Machine Learning Algorithms A Comparative Analysis Approach
4 pages
DWDM Ii Mid Paper
No ratings yet
DWDM Ii Mid Paper
2 pages
New K Means - Jupyter Notebook
No ratings yet
New K Means - Jupyter Notebook
4 pages
Queue Data Structure
100% (1)
Queue Data Structure
9 pages
AIML Lect5 Assignment ID3
No ratings yet
AIML Lect5 Assignment ID3
2 pages
Optimisasi Fungsi Nonlinier Dua Variabel Bebas Dengan Satu Kendala Pertidaksamaan Menggunakan Syarat Kuhn-Tucker
No ratings yet
Optimisasi Fungsi Nonlinier Dua Variabel Bebas Dengan Satu Kendala Pertidaksamaan Menggunakan Syarat Kuhn-Tucker
6 pages
Jntuk R20 ML Unit-Iii
100% (1)
Jntuk R20 ML Unit-Iii
21 pages
Mini Project On Diabetes Prediction: Information Technology
No ratings yet
Mini Project On Diabetes Prediction: Information Technology
19 pages
COMP3308/COMP3608 Artificial Intelligence Week 10 Tutorial Exercises Support Vector Machines. Ensembles of Classifiers
No ratings yet
COMP3308/COMP3608 Artificial Intelligence Week 10 Tutorial Exercises Support Vector Machines. Ensembles of Classifiers
3 pages
SVM - Friend or Foe?: Reason 1
No ratings yet
SVM - Friend or Foe?: Reason 1
9 pages
Lab 8 Manual
No ratings yet
Lab 8 Manual
8 pages
COMP3308/3608 Artificial Intelligence Week 9 Tutorial Exercises Multilayer Neural Networks 2. Deep Learning
No ratings yet
COMP3308/3608 Artificial Intelligence Week 9 Tutorial Exercises Multilayer Neural Networks 2. Deep Learning
2 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
MSOP Securtisation
No ratings yet
MSOP Securtisation
56 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Bagging and Boosting Classification Trees To Predict Churn.: Insights From The US Telecom Industry
No ratings yet
Bagging and Boosting Classification Trees To Predict Churn.: Insights From The US Telecom Industry
32 pages
MLT Notes
No ratings yet
MLT Notes
17 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Database Integration
No ratings yet
Database Integration
14 pages
Assignment B 2 EmailClassification
No ratings yet
Assignment B 2 EmailClassification
6 pages
Divisive Hierarchical Clustering Using DIANA Technique
No ratings yet
Divisive Hierarchical Clustering Using DIANA Technique
4 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
Bootstrap
No ratings yet
Bootstrap
52 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Suzuki Samurai
No ratings yet
Suzuki Samurai
1 page
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
ML Projects For Final Year
No ratings yet
ML Projects For Final Year
7 pages
Noc20-Cs28 Week 07 Assignment 01 PDF
No ratings yet
Noc20-Cs28 Week 07 Assignment 01 PDF
3 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Syllabus For CSCI 631 - Foundations of Computer Vision
No ratings yet
Syllabus For CSCI 631 - Foundations of Computer Vision
1 page
Vinee
100% (1)
Vinee
28 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
Ayaz Memon On MS Dhoni
No ratings yet
Ayaz Memon On MS Dhoni
11 pages
Ex No: 5 Curve Fitting Using Polynomial Regression: Description
No ratings yet
Ex No: 5 Curve Fitting Using Polynomial Regression: Description
5 pages
Recommendation System in Python
No ratings yet
Recommendation System in Python
13 pages
Predicting Cardiovascular Disease Using Logistic Regression Research Paper
No ratings yet
Predicting Cardiovascular Disease Using Logistic Regression Research Paper
4 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
19 pages
Session 18 Time Series Forecasting
No ratings yet
Session 18 Time Series Forecasting
30 pages
Music Recommendation Based On Facial Expression
No ratings yet
Music Recommendation Based On Facial Expression
4 pages
Deep Learning Based Recommendation Systems
No ratings yet
Deep Learning Based Recommendation Systems
47 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages
Performance Evaluation of Machine Learning Algorithms in Post-Operative Life Expectancy in The Lung Cancer Patients
No ratings yet
Performance Evaluation of Machine Learning Algorithms in Post-Operative Life Expectancy in The Lung Cancer Patients
11 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
Winter
No ratings yet
Winter
1 page
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Bird Species Identification Using Deep Learning IJERTV8IS040112 6
No ratings yet
Bird Species Identification Using Deep Learning IJERTV8IS040112 6
5 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Pneumonia Detection Using X-Rays by Deep Learning
No ratings yet
Pneumonia Detection Using X-Rays by Deep Learning
6 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Networking Midterm Preparation
No ratings yet
Networking Midterm Preparation
2 pages
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet