0% found this document useful (0 votes)

7 views31 pages

ML8 Ensembles

The document discusses machine learning ensembles, focusing on methods like bagging and boosting to improve model accuracy by combining multiple classifiers. It explains how ensembles can reduce error rates by averaging predictions and emphasizes the importance of diverse and independent classifiers. Techniques such as resampling training data in bagging and adjusting weights in boosting are highlighted as effective strategies for enhancing learning performance.

Uploaded by

testphishingguy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views31 pages

ML8 Ensembles

Uploaded by

testphishingguy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

MACHINE LEARNING

ENSEMBLES

Slides adjusted from Raymond J. Mooney, University of Texas at Austin

Learning Ensembles
Learn multiple alternative definitions of a concept using
different training data or different learning algorithms
Combine decisions of multiple definitions, e.g. using
weighted voting Training Data

Data1 Data2  Data m

Learner1 Learner2  Learner m

Model1 Model2  Model m

Model Combiner Final Model

2
Value of Ensembles
When combing multiple independent and
diverse decisions each of which is at least
more accurate than random guessing,
random errors cancel each other out,
correct decisions are reinforced
Human ensembles are demonstrably better
◦ How many jelly beans in the jar? -
https://www.youtube.com/watch?v=iOucwX7Z1
HU&t=103s : Individual estimates vs. group
average
◦ Who Wants to be a Millionaire: Expert friend vs.
audience vote

3
Why does it work?
Suppose there are 25 base classifiers
◦ Each classifier has error rate,  = 0.35
◦ Assume classifiers are independent
◦ Probability that the ensemble classifier makes
a wrong prediction is the probability that 13
or more classifiers err :
  i
25 25

  
i
i 13 
 (1   ) 25i
 0.06

This value approaches 0 as the number of classifiers
increases. This is just the law of large numbers but in this
context is sometimes called “Condorcet’s Jury Theorem”
4
Homogenous Ensembles
Use a single, arbitrary learning algorithm but
manipulate training data to make it learn
multiple models
◦ Data1  Data2  …  Data m
◦ Learner1 = Learner2 = … = Learner m
Different methods for changing training data:
◦ Bagging: Resample training data
◦ Boosting: Reweight training data
◦ DECORATE: Add additional artificial training data
In WEKA, these are called meta-learners, they
take a learning algorithm as an argument (base
learner) and create a new learning algorithm

5
Bagging
Create ensembles by repeatedly randomly resampling
the training data (Brieman, 1996)
Given a training set of size w, create m samples of size n
by drawing n examples from the original data, with
replacement
Combine the m resulting models using simple majority
vote
Decreases error by decreasing the variance in the
results due to unstable learners, algorithms (like
decision trees) whose output can change dramatically
when the training data is slightly changed

6
The Problem with Single Decision Trees

7
Bagging : Bootstrap Aggregating

1. Sample records with replacement (aka "bootstrap"

the training data)
Sampling is the process of selecting a subset of items
from a vast collection of items
Bootstrap = Sampling with replacement. It means a data
point in a drawn sample can reappear in future drawn
samples as well

8
Bagging : Bootstrap Aggregating

2. Fit an overgrown tree to each resampled data set

3. Average predictions

9
Bagging : Bootstrap Aggregating
As we add more trees... our average prediction error reduces

10
Bagging
Sampling with replacement
Training Data
Data ID
O riginal D ata 1 2 3 4 5 6 7 8 9 10
Baggin g (Round 1) 7 8 10 8 2 5 10 10 5 9
Baggin g (Round 2) 1 4 9 1 2 3 2 7 3 2
Baggin g (Round 3) 1 8 5 10 5 5 9 6 3 7

Build classifier on each bootstrap sample

Each sample has probability (1 – 1/n)n of being selected
as test data
Training data = 1- (1 – 1/n)n of the original data

11
The 0.632 bootstrap
This method is also called the 0.632 bootstrap
◦ A particular training data has a probability of
1-1/n of not being picked
◦ Thus, its probability of ending up in the test
data (not selected) is:
n
 1 1
1    e  0.368
 n
◦ This means the training data will contain
approximately 63.2% of the instances

12
Bagging Algorithm
Bagging Example
Consider 1-dimensional data set:
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
y 1 1 1 -1 -1 -1 -1 1 1 1

Classifier is a decision stump

◦ Decision rule: x  k versus x > k
◦ Split point k is chosen based on entropy
xk

True False

yleft yright

14
Bagging Example
Bagging Round 1:
x 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.9 0.9
y 1 1 1 1 -1 -1 -1 -1 1 1

15
Bagging Example
Bagging Round 1:
x 0.1 0.2 0.2 0.3 0.4 0.4 0.5 0.6 0.9 0.9
y 1 1 1 1 -1 -1 -1 -1 1 1

Bagging Round 2:
x 0.1 0.2 0.3 0.4 0.5 0.5 0.9 1 1 1
y 1 1 1 -1 -1 -1 1 1 1 1

Bagging Round 3:
x 0.1 0.2 0.3 0.4 0.4 0.5 0.7 0.7 0.8 0.9
y 1 1 1 -1 -1 -1 -1 -1 1 1

Bagging Round 4:
x 0.1 0.1 0.2 0.4 0.4 0.5 0.5 0.7 0.8 0.9
y 1 1 1 -1 -1 -1 -1 -1 1 1

Bagging Round 5:
x 0.1 0.1 0.2 0.5 0.6 0.6 0.6 1 1 1
y 1 1 1 -1 -1 -1 -1 1 1 1

16
Bagging Example
Bagging Round 6:
x 0.2 0.4 0.5 0.6 0.7 0.7 0.7 0.8 0.9 1
y 1 -1 -1 -1 -1 -1 -1 1 1 1

Bagging Round 7:
x 0.1 0.4 0.4 0.6 0.7 0.8 0.9 0.9 0.9 1
y 1 -1 -1 -1 -1 1 1 1 1 1

Bagging Round 8:
x 0.1 0.2 0.5 0.5 0.5 0.7 0.7 0.8 0.9 1
y 1 1 -1 -1 -1 -1 -1 1 1 1

Bagging Round 9:
x 0.1 0.3 0.4 0.4 0.6 0.7 0.7 0.8 1 1
y 1 1 -1 -1 -1 -1 -1 1 1 1

Bagging Round 10:

x 0.1 0.1 0.1 0.1 0.3 0.3 0.8 0.8 0.9 0.9
y 1 1 1 1 1 1 1 1 1 1

17
Bagging Example
Summary of Training sets:
Round Split Point Left Class Right Class
1 0.35 1 -1
2 0.7 1 1
3 0.35 1 -1
4 0.3 1 -1
5 0.35 1 -1
6 0.75 -1 1
7 0.75 -1 1
8 0.75 -1 1
9 0.75 -1 1
10 0.05 1 1

18
Bagging Example
Assume test set is the same as the original data
Use majority vote to determine class of
ensemble classifier
Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 1 1 1 -1 -1 -1 -1 -1 -1 -1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 -1 -1 -1 -1 -1 -1 -1
4 1 1 1 -1 -1 -1 -1 -1 -1 -1
5 1 1 1 -1 -1 -1 -1 -1 -1 -1
6 -1 -1 -1 -1 -1 -1 -1 1 1 1
7 -1 -1 -1 -1 -1 -1 -1 1 1 1
8 -1 -1 -1 -1 -1 -1 -1 1 1 1
9 -1 -1 -1 -1 -1 -1 -1 1 1 1
10 1 1 1 1 1 1 1 1 1 1
Predicted Sum 2 2 2 -6 -6 -6 -6 2 2 2
Class
Sign 1 1 1 -1 -1 -1 -1 1 1 1

19
Boosting
Originally developed by computational learning
theorists to guarantee performance improvements on
fitting training data for a weak learner that only needs
to generate a hypothesis with a training accuracy
greater than 0.5 (Schapire, 1990)
Revised to be a practical algorithm, AdaBoost, for
building ensembles that empirically improves
generalization performance (Freund & Shapire, 1996)
Examples are given weights. At each iteration, a new
hypothesis is learned and the examples are reweighted
to focus the system on examples that the most recently
learned classifier got wrong

22
Learning with Weighted Examples
Generic approach is to replicate examples in
the training set proportional to their weights
(e.g. 10 replicates of an example with a weight
of 0.01 and 100 for one with weight 0.1)
Most algorithms can be enhanced to efficiently
incorporate weights directly in the learning
algorithm so that the effect is the same (e.g.
implement the WeightedInstancesHandler
interface in WEKA).
For decision trees, for calculating information
gain, when counting example i, simply increment
the corresponding count by wi rather than by 1

23
Boosting
Records that are wrongly classified will have
their weights increased
Records that are classified correctly will have
their weights decreased
Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

Example 4 is hard to classify

Its weight is increased, therefore it is more
likely to be chosen again in subsequent rounds

24
Boosting: Basic Algorithm
General Loop:
Set all examples to have equal uniform weights
For t from 1 to T do:
Learn a hypothesis, ht, from the weighted examples
Decrease the weights of examples ht classifies
correctly
Base (weak) learner must focus on correctly classifying
the most highly weighted examples while strongly
avoiding over-fitting
During testing, each of the T hypotheses get a weighted
vote proportional to their accuracy on the training
data

25
Types of Boosting Algorithms
Underlying engine used for boosting algorithms
can be anything. It can be decision stamp, margin-
maximizing classification algorithm etc.There are
many boosting algorithms which use other types
of engine such as:
◦ AdaBoost (Adaptive Boosting)
◦ Gradient Tree Boosting
◦ GentleBoost
◦ LPBoost
◦ BrownBoost
◦ XGBoost
◦ CatBoost
◦ Lightgbm

26
AdaBoost (Adaptive Boosting)
It works on similar method as discussed above. It fits a
sequence of weak learners on different
weighted training data.
It starts by predicting original data set and gives equal weight
to each observation. If prediction is incorrect using the first
learner, then it gives higher weight to observation which have
been predicted incorrectly. Being an iterative process, it
continues to add learner(s) until a limit is reached in the
number of models or accuracy
Mostly, we use decision stamps with AdaBoost. But, we can
use any machine learning algorithms as base learner if
it accepts weight on training data set
We can use AdaBoost algorithms for both classification and
regression problem

27
AdaBoost Algorithm

28
Example: Error and Classifier Weight
in AdaBoost
Base classifiers: C1, C2,
…, CT

Error rate: (i = index of

classifier, j=index of
instance)
 w  C ( x )  y 
N
1
i  j i j j
N j 1

Importance of a
classifier:
1  1  i 
i  ln  
2  i 
30
Example: Data Instance Weight in
AdaBoost
Assume: N training data in D, T rounds, (xj,yj)
are the training data, Ci, ɑi are the classifier and
weight of the ith round, respectively
Weight update on all training data in D:
exp if
1. exp if
where is the normalization factor

$
if
$
% '$
$
· if
1 ) '$
2.

Rescale the weights of all the examples so the total

sum weight remains 1

31
AdaBoost Example
Consider 1-dimensional data set:

x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

y 1 1 1 -1 -1 -1 -1 1 1 1

Classifier is a decision stump

◦ Decision rule: x  k versus x > k
◦ Split point k is chosen based on entropy

xk

True False

yleft yright

32
AdaBoost Example
Training sets for the first 3 boosting rounds:
Boosting Round 1:
x 0.1 0.4 0.5 0.6 0.6 0.7 0.7 0.7 0.8 1
y 1 -1 -1 -1 -1 -1 -1 -1 1 1

Boosting Round 2:
x 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3
y 1 1 1 1 1 1 1 1 1 1

Boosting Round 3:
x 0.2 0.2 0.4 0.4 0.4 0.4 0.5 0.6 0.6 0.7
y 1 1 -1 -1 -1 -1 -1 -1 -1 -1
Summary:
Round Split Point Left Class Right Class alpha
1 0.75 -1 1 1.738
2 0.05 1 1 2.7784
3 0.3 1 -1 4.1195

33
AdaBoost Example
Weights
Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
2 0.311 0.311 0.311 0.01 0.01 0.01 0.01 0.01 0.01 0.01
3 0.029 0.029 0.029 0.228 0.228 0.228 0.228 0.009 0.009 0.009

Classification

Round x=0.1 x=0.2 x=0.3 x=0.4 x=0.5 x=0.6 x=0.7 x=0.8 x=0.9 x=1.0
1 -1 -1 -1 -1 -1 -1 -1 1 1 1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 -1 -1 -1 -1 -1 -1 -1
Sum 5.16 5.16 5.16 -3.08 -3.08 -3.08 -3.08 0.397 0.397 0.397
Predicted Sign 1 1 1 -1 -1 -1 -1 1 1 1
Class

Worksheet Method of Lagrange Multipliers
No ratings yet
Worksheet Method of Lagrange Multipliers
2 pages
LP Validation Process
No ratings yet
LP Validation Process
2 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Machine Learning: Lecture 8: Ensemble Methods
No ratings yet
Machine Learning: Lecture 8: Ensemble Methods
28 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Module 5,1 Ensemble - Bagging, RF, Boosting
No ratings yet
Module 5,1 Ensemble - Bagging, RF, Boosting
66 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
ML Chapter 3
No ratings yet
ML Chapter 3
25 pages
Lecture Slide 12
No ratings yet
Lecture Slide 12
22 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Lecture 17 - Ensemble Learning
No ratings yet
Lecture 17 - Ensemble Learning
31 pages
Ensemble Learning-1
No ratings yet
Ensemble Learning-1
61 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Ensemble
No ratings yet
Ensemble
33 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Voting or Averaging of Predictions of Multiple Pre-Trained Models
No ratings yet
Voting or Averaging of Predictions of Multiple Pre-Trained Models
23 pages
22 Boosting
No ratings yet
22 Boosting
32 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
Ens Embling
No ratings yet
Ens Embling
8 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Bagging and Boosting: Amit Srinet Dave Snyder
No ratings yet
Bagging and Boosting: Amit Srinet Dave Snyder
33 pages
Boosting
No ratings yet
Boosting
28 pages
ML Exp 9
No ratings yet
ML Exp 9
3 pages
Introduction To Boosting - 2
No ratings yet
Introduction To Boosting - 2
79 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Combining Classifiers: Outline
No ratings yet
Combining Classifiers: Outline
15 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Ensemble (v6)
No ratings yet
Ensemble (v6)
45 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
09 EnsembleLearning
No ratings yet
09 EnsembleLearning
36 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
ENG6500 7 Ensembles Boosting
No ratings yet
ENG6500 7 Ensembles Boosting
49 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Exp 3
No ratings yet
Exp 3
11 pages
DSA5102 Lecture3
No ratings yet
DSA5102 Lecture3
34 pages
ML Module 5 2022 PDF
100% (2)
ML Module 5 2022 PDF
31 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Adaboost
No ratings yet
Adaboost
29 pages
Ensembles Learning
No ratings yet
Ensembles Learning
16 pages
Invpower
No ratings yet
Invpower
1 page
Avl PDF
No ratings yet
Avl PDF
60 pages
Unit 5: Linear Data Structure - Queues
100% (1)
Unit 5: Linear Data Structure - Queues
31 pages
Algorithms Types - Discrete Mathematics Questions and Answers - Sanfoundry
No ratings yet
Algorithms Types - Discrete Mathematics Questions and Answers - Sanfoundry
7 pages
Graphing Rational Functions PDF
No ratings yet
Graphing Rational Functions PDF
4 pages
ASSIGNMENT 1 - Basic Concepts of Analysis and Design of Algorithms
No ratings yet
ASSIGNMENT 1 - Basic Concepts of Analysis and Design of Algorithms
3 pages
D. Graph of Hyperbolic Functions: Plotsinhx, X, 3, 3, Plotstyle Black, Thick
No ratings yet
D. Graph of Hyperbolic Functions: Plotsinhx, X, 3, 3, Plotstyle Black, Thick
5 pages
Cs8451 Design and Analysis of Algorithms: Unit-I Part-A 1. Define Algorithm
No ratings yet
Cs8451 Design and Analysis of Algorithms: Unit-I Part-A 1. Define Algorithm
21 pages
16.2 MemTest Classical
No ratings yet
16.2 MemTest Classical
14 pages
Chap 2 Polynomials MCQs
No ratings yet
Chap 2 Polynomials MCQs
5 pages
Solution of The Nonlinear Finite Element Equations in Nonlinear FEM
No ratings yet
Solution of The Nonlinear Finite Element Equations in Nonlinear FEM
48 pages
Using Combinatorial Optimization To Desi
No ratings yet
Using Combinatorial Optimization To Desi
9 pages
Or CH 3
No ratings yet
Or CH 3
104 pages
Assignment-9 Solution July 2019
No ratings yet
Assignment-9 Solution July 2019
7 pages
RNSIT BCSL404 - ADA Lab Manual
0% (1)
RNSIT BCSL404 - ADA Lab Manual
32 pages
Theory of Computation: C S I S
No ratings yet
Theory of Computation: C S I S
63 pages
Multiple-Choice Test Gaussian Elimination Simultaneous Linear Equations
No ratings yet
Multiple-Choice Test Gaussian Elimination Simultaneous Linear Equations
9 pages
Kuhn's Algorithm For Finding The Greatest Matching in A Bipartite Graph
No ratings yet
Kuhn's Algorithm For Finding The Greatest Matching in A Bipartite Graph
6 pages
Numerical Solution To PDEs
No ratings yet
Numerical Solution To PDEs
8 pages
BSCS Assessment Plan - Updated Rubrics, January, 2016
No ratings yet
BSCS Assessment Plan - Updated Rubrics, January, 2016
12 pages
AMS - Feature Column From The AMS
No ratings yet
AMS - Feature Column From The AMS
40 pages
Pattern 1: Sliding Window: Find Averages of Sub Arrays
No ratings yet
Pattern 1: Sliding Window: Find Averages of Sub Arrays
143 pages
Bca2nd Year Notes
No ratings yet
Bca2nd Year Notes
49 pages
Finite Difference Method For Solving Heat and Mass Transfer Problems
No ratings yet
Finite Difference Method For Solving Heat and Mass Transfer Problems
2 pages
Artificial Neural Networks Unit 3: Single-Layer Perceptrons
No ratings yet
Artificial Neural Networks Unit 3: Single-Layer Perceptrons
11 pages
Markov Random Fields
No ratings yet
Markov Random Fields
8 pages
Materi AI Pertemuan 1
No ratings yet
Materi AI Pertemuan 1
9 pages
Quantum Computing - Lecture Notes
100% (2)
Quantum Computing - Lecture Notes
114 pages

ML8 Ensembles

Uploaded by

ML8 Ensembles

Uploaded by

MACHINE LEARNING

Slides adjusted from Raymond J. Mooney, University of Texas at Austin

Data1 Data2  Data m

Learner1 Learner2  Learner m

Model1 Model2  Model m

Model Combiner Final Model

1. Sample records with replacement (aka "bootstrap"

2. Fit an overgrown tree to each resampled data set

Build classifier on each bootstrap sample

Classifier is a decision stump

Bagging Round 10:

Example 4 is hard to classify

Error rate: (i = index of

Rescale the weights of all the examples so the total

x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Classifier is a decision stump

You might also like