0% found this document useful (0 votes)

9 views16 pages

Ensembles Learning

Uploaded by

salahalj2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views16 pages

Ensembles Learning

Uploaded by

salahalj2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 16

Ensemble Learning General Idea

Original
D Training data

Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets

Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers

Step 3:
Combine C*
Classifiers
Why does it work?

 Suppose there are 25 base classifiers

– Each classifier has error rate,  = 0.35
– Assume classifiers are independent
– Probability that the ensemble classifier makes
a wrong prediction:
25
 25  i
 
 i 
i 13 
  (1   ) 25 i
0.06

Examples of Ensemble Methods

 How to generate an ensemble of classifiers?

– Bagging

– Boosting
Bagging

 Main Assumption:
– Combining many unstable predictors to produce a
ensemble (stable) predictor.
– Unstable Predictor: small changes in training data
produce large changes in the model.
e.g.Neural Nets, trees
Stable: SVM (sometimes), Nearest Neighbor.

 Hypothesis Space
– Variable size (nonparametric):
Can model any function if you use an appropriate predictor
(e.g. trees)

Ensembles 4
The Bagging Algorithm

Given data: D = {(x1 , y1 ),..., (x N , y N )}

For m = 1: M
 Obtain bootstrap sample D from the training
m
data D
 Build a model G (x) from bootstrap data D
m m

Ensembles 5
The Bagging Model

 Regression

M
1
yˆ =
 Classification: M
åG
m=1
m (x )
– Vote over classifier outputs

G1 (x),..., GM (x)

Ensembles 6
Bagging Details

 Bootstrap sample of N instances is obtained by

drawing N examples at random, with
replacement.
 On average each bootstrap sample
has 63% of instances
– Encourages predictors to have
uncorrelated errors
This is why it works

Ensembles 7
Bagging Details 2

 Usually set M =~ 30
– Or use validation data to pick M
 The models G (x) need to be unstable
m
– Usually full length (or slightly pruned) decision
trees.

Ensembles 8
Bagging

 Sampling with replacement

Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

 Build classifier on each bootstrap sample

 Each sample has probability (1 – 1/n) n of being

selected
Boosting

 An iterative procedure to adaptively change

distribution of training data by focusing more on
previously misclassified records
– Initially, all N records are assigned equal
weights
– Unlike bagging, weights may change at the
end of boosting round
Boosting

– Main Assumption:
Combining many weak to produce an ensemble
predictor
The weak predictors or classifiers need to be

stable
– Hypothesis Space
Variable size (nonparametric):
– Can model any function if you use an appropriate predictor
(e.g. trees)

Ensembles 11
Commonly Used Weak Predictor
(or classifier)

A Decision Tree Stump (1-R)

Ensembles 12
Boosting

Each classifier Gm (x) is

trained from a weighted
Sample of the training
Data

Ensembles 13
Boosting

 Records that are wrongly classified will have their

weights increased
 Records that are classified correctly will have
their weights decreased

Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example 4 is hard to classify

• Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds
Methods for Performance
Evaluation

 How to obtain a reliable estimate of

performance?

 Performance of a model may depend on other

factors besides the learning algorithm:
– Class distribution
– Cost of misclassification
– Size of training and test sets
Learning Curve

 Learning curve shows

how accuracy changes
with varying sample size
 Requires a sampling
schedule for creating
learning curve:
 Arithmetic sampling
(Langley, et al)
 Geometric sampling
(Provost et al)

Effect of small sample size:

- Bias in the estimate
- Variance of estimate

Oneway ANOVA
No ratings yet
Oneway ANOVA
38 pages
(Practical Guides To Biostatistics and Epidemiology) Jos W. R. Twisk - Applied Mixed Model Analysis - A Practical Guide-Cambridge University Press (2019)
100% (2)
(Practical Guides To Biostatistics and Epidemiology) Jos W. R. Twisk - Applied Mixed Model Analysis - A Practical Guide-Cambridge University Press (2019)
243 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Design of Experiments 1
No ratings yet
Design of Experiments 1
37 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Unit 3
No ratings yet
Unit 3
99 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Baron Kenny 1986 PDF
No ratings yet
Baron Kenny 1986 PDF
10 pages
Analysis of Variance Anova: Charles Quigley Liberty University
100% (1)
Analysis of Variance Anova: Charles Quigley Liberty University
13 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
CH 7 - Ensemble Learning and Random Forests
No ratings yet
CH 7 - Ensemble Learning and Random Forests
78 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Module 5,1 Ensemble - Bagging, RF, Boosting
No ratings yet
Module 5,1 Ensemble - Bagging, RF, Boosting
66 pages
Slides 1 Arnold Ventures 2024
No ratings yet
Slides 1 Arnold Ventures 2024
68 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Lecture 2
No ratings yet
Lecture 2
35 pages
Ensemble Learning-1
No ratings yet
Ensemble Learning-1
61 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
2018 Mult 9
No ratings yet
2018 Mult 9
46 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
ML Chapter 3
No ratings yet
ML Chapter 3
25 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
24 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
Cs3351 Aiml Unit 3 Notes Eduengg
No ratings yet
Cs3351 Aiml Unit 3 Notes Eduengg
32 pages
Ensemble Methods Send
No ratings yet
Ensemble Methods Send
20 pages
Ensemble
No ratings yet
Ensemble
33 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Ensemble Techniques Presentation
No ratings yet
Ensemble Techniques Presentation
17 pages
Lecture Slide 12
No ratings yet
Lecture Slide 12
22 pages
UNIT3 Class
No ratings yet
UNIT3 Class
30 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
ML8 Ensembles
No ratings yet
ML8 Ensembles
31 pages
Statistics in Medicine - 2024 - Zhang - Weighted Expectile Regression Neural Networks For Right Censored Data
No ratings yet
Statistics in Medicine - 2024 - Zhang - Weighted Expectile Regression Neural Networks For Right Censored Data
15 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Ditzen 2018
No ratings yet
Ditzen 2018
33 pages
Ensemble Learning SA
No ratings yet
Ensemble Learning SA
27 pages
Assignment
No ratings yet
Assignment
20 pages
ML File
No ratings yet
ML File
17 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Lecture 17 - Ensemble Learning
No ratings yet
Lecture 17 - Ensemble Learning
31 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Tugas Analisis Regresi - 220020076 - Novita Ratna Dewi
No ratings yet
Tugas Analisis Regresi - 220020076 - Novita Ratna Dewi
17 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
ML Cat 2 - 7
No ratings yet
ML Cat 2 - 7
30 pages
Module 4
No ratings yet
Module 4
30 pages
16-Ensemble Learning - Cont... - 12-04-2024
No ratings yet
16-Ensemble Learning - Cont... - 12-04-2024
13 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Ensemble Learning
No ratings yet
Ensemble Learning
30 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Ensemble Methods (Final)
No ratings yet
Ensemble Methods (Final)
16 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
Ens Embling
No ratings yet
Ens Embling
8 pages
ARMA and ARIMA TJ Academy
No ratings yet
ARMA and ARIMA TJ Academy
6 pages
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
No ratings yet
Regression Analysis: Prof. Prema Muthuswamy KCT, Coimbatore
22 pages
CSE-4119 Assignment
No ratings yet
CSE-4119 Assignment
3 pages
Reading 10 Simple Linear Regression
No ratings yet
Reading 10 Simple Linear Regression
3 pages
Regression Trees Chapter2
No ratings yet
Regression Trees Chapter2
21 pages
Pseudo-R Squared
No ratings yet
Pseudo-R Squared
9 pages
Email: &: Lalumasyhudi@stpmataram - Ac.id
No ratings yet
Email: &: Lalumasyhudi@stpmataram - Ac.id
16 pages
JM jap,+OSNAVITA+KABIAY+JURNAL
No ratings yet
JM jap,+OSNAVITA+KABIAY+JURNAL
8 pages
Lampiran SPSS 25 Desember 2023
No ratings yet
Lampiran SPSS 25 Desember 2023
2 pages
Multinomial Logistic Regression Models: Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1
No ratings yet
Multinomial Logistic Regression Models: Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1
5 pages
Chapter 6. Simultaneous Equations
No ratings yet
Chapter 6. Simultaneous Equations
11 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
Tutorial 10
No ratings yet
Tutorial 10
3 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Answers 4
No ratings yet
Answers 4
10 pages
Extra - 2017 - AMJ - PROCESS Versus Structural Equation Modeling
No ratings yet
Extra - 2017 - AMJ - PROCESS Versus Structural Equation Modeling
6 pages
Two-Group Designs: Homogeneity of Variance Tests For Two or More Groups
No ratings yet
Two-Group Designs: Homogeneity of Variance Tests For Two or More Groups
3 pages
Adv Time Series
No ratings yet
Adv Time Series
7 pages
Weight
No ratings yet
Weight
5 pages
Combining Classifiers: Outline
No ratings yet
Combining Classifiers: Outline
15 pages
Julia for Data Science
From Everand
Julia for Data Science
Anshul Joshi
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Ensembles Learning

Uploaded by

Ensembles Learning

Uploaded by

Ensemble Learning General Idea

 Suppose there are 25 base classifiers

 How to generate an ensemble of classifiers?

Given data: D = {(x1 , y1 ),..., (x N , y N )}

 Bootstrap sample of N instances is obtained by

 Sampling with replacement

 Build classifier on each bootstrap sample

 Each sample has probability (1 – 1/n) n of being

 An iterative procedure to adaptively change

A Decision Tree Stump (1-R)

Each classifier Gm (x) is

 Records that are wrongly classified will have their

• Example 4 is hard to classify

 How to obtain a reliable estimate of

 Performance of a model may depend on other

 Learning curve shows

Effect of small sample size:

You might also like