Ensemble Method

Ensemble methods combine multiple models to enhance accuracy, with popular techniques including bagging, boosting, and random forests. Bagging involves averaging predictions from various classifiers, while boosting focuses on weighted votes based on classifier accuracy. Additionally, strategies for handling class-imbalanced datasets include oversampling, under-sampling, and threshold-moving, which are essential for improving classification in scenarios with rare positive examples.

Uploaded by

Prerna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views8 pages

Ensemble Method

Uploaded by

Prerna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 8

ENSEMBLE METHODS: INCREASING THE ACCURACY

Ensemble methods
 Use a combination of models to
increase accuracy
 Combine a series of k learned
models, M1, M2, …, Mk, with the aim
of creating an improved model M*
Popular ensemble methods
 Bagging: averaging the prediction
over a collection of classifiers
 Boosting: weighted vote with a
collection of classifiers
 Ensemble: combining a set of
heterogeneous classifiers
1
BAGGING: BOOSTRAP AGGREGATION
Analogy: Diagnosis based on multiple doctors’ majority vote
Training
 Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled
with replacement from D (i.e., bootstrap)
 A classifier model Mi is learned for each training set D i

Classification: classify an unknown sample X

 Each classifier Mi returns its class prediction
 The bagged classifier M* counts the votes and assigns the class with the most votes
to X
Prediction: can be applied to the prediction of continuous values by taking the average
value of each prediction for a given test tuple
Accuracy
 Often significantly better than a single classifier derived from D
 For noise data: not considerably worse, more robust
 Proved improved accuracy in prediction
2
BOOSTING
Analogy: Consult several doctors, based on a combination of
weighted diagnoses—weight assigned based on the previous
diagnosis accuracy
How boosting works?
 Weights are assigned to each training tuple
 A series of k classifiers is iteratively learned
 After a classifier Mi is learned, the weights are updated to allow
the subsequent classifier, Mi+1, to pay more attention to the
training tuples that were misclassified by Mi
 The final M* combines the votes of each individual classifier,
where the weight of each classifier's vote is a function of its
accuracy
Boosting algorithm can be extended for numeric prediction
Comparing with bagging: Boosting tends to have greater accuracy,
but it also risks overfitting the model to misclassified data
4
ADABOOST (FREUND AND SCHAPIRE, 1997)

Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)

Initially, all the weights of tuples are set the same (1/d)
Generate k classifiers in k rounds. At round i,
 Tuples from D are sampled (with replacement) to form a training set D i of the same size
 Each tuple’s chance of being selected is based on its weight
 A classification model Mi is derived from Di
 Its error rate is calculated using Di as a test set
 If a tuple is misclassified, its weight is increased, o.w. it is decreased

Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier Mi error rate is the sum of
the weights of the misclassified tuples: d
error ( M i )  w j err ( X j )
j

1  error ( M i )
log
The weight of classifier Mi’s vote is error ( M i )
5
RANDOM FOREST ( BREIMAN
2001)
Random Forest:
 Each classifier in the ensemble is a decision tree classifier and is generated
using a random selection of attributes at each node to determine the split
 During classification, each tree votes and the most popular class is returned

Two Methods to construct Random Forest:

 Forest-RI (random input selection): Randomly select, at each node, F
attributes as candidates for the split at the node. The CART methodology is
used to grow the trees to maximum size
 Forest-RC (random linear combinations): Creates new attributes (or
features) that are a linear combination of the existing attributes (reduces
the correlation between individual classifiers)
Comparable in accuracy to Adaboost, but more robust to errors and outliers
Insensitive to the number of attributes selected for consideration at each
split, and faster than bagging or boosting
7
CLASSIFICATION OF CLASS-IMBALANCED
DATA SETS
Class-imbalance problem: Rare positive example but numerous negative
ones, e.g., medical diagnosis, fraud, oil-spill, fault, etc.
Traditional methods assume a balanced distribution of classes and equal
error costs: not suitable for class-imbalanced data
Typical methods for imbalance data in 2-class classification:
 Oversampling: re-sampling of data from positive class
 Under-sampling: randomly eliminate tuples from negative class
 Threshold-moving: moves the decision threshold, t, so that the rare
class tuples are easier to classify, and hence, less chance of costly false
negative errors
 Ensemble techniques: Ensemble multiple classifiers introduced above
Still difficult for class imbalance problem on multiclass tasks
8

ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Time Complexities Data Structures and Algorithm 1646968563
No ratings yet
Time Complexities Data Structures and Algorithm 1646968563
2 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Tree Traversal
No ratings yet
Tree Traversal
35 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Unit I ML (I) 24-25-1
No ratings yet
Unit I ML (I) 24-25-1
152 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Unit I ML (I) 24-25
No ratings yet
Unit I ML (I) 24-25
79 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
U4 - Limitations of Algo Power
No ratings yet
U4 - Limitations of Algo Power
18 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Assignment 2 Brief: Format
No ratings yet
Assignment 2 Brief: Format
24 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Asymptotic Notations
No ratings yet
Asymptotic Notations
18 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Company Wise Leetcode Questions1
No ratings yet
Company Wise Leetcode Questions1
9 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
ENsemble, Random Forest
No ratings yet
ENsemble, Random Forest
28 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
DAA-Lab Manual-NBA 2023 - Geeta Rani
No ratings yet
DAA-Lab Manual-NBA 2023 - Geeta Rani
22 pages
Competitive Programming Resource
No ratings yet
Competitive Programming Resource
3 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
R23 DSA Lab Manual (Instructor Copy)
No ratings yet
R23 DSA Lab Manual (Instructor Copy)
104 pages
Module 2
No ratings yet
Module 2
34 pages
ML Lecture 7 - Ensemble Learning
No ratings yet
ML Lecture 7 - Ensemble Learning
18 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
No ratings yet
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
18 pages
2025 Ensemble Learning
No ratings yet
2025 Ensemble Learning
25 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Unit 3
No ratings yet
Unit 3
59 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Bagging
No ratings yet
Bagging
7 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
U1-Ensemble Methods
No ratings yet
U1-Ensemble Methods
17 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Strassen Matrix Multiplication
No ratings yet
Strassen Matrix Multiplication
3 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Optimization Technique (MA10003)
No ratings yet
Optimization Technique (MA10003)
3 pages
CL Back Propogation
No ratings yet
CL Back Propogation
11 pages
Unit 3
No ratings yet
Unit 3
63 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Bagging Vs Boosting - Javatpoint
No ratings yet
Bagging Vs Boosting - Javatpoint
8 pages
BD Sec B
No ratings yet
BD Sec B
19 pages
Machine Learning Lecture 2,3,4
No ratings yet
Machine Learning Lecture 2,3,4
26 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
DAA Slides U3
No ratings yet
DAA Slides U3
240 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Boosting
No ratings yet
Boosting
2 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Data Structures Interview Questions-Stacks - BALUTUTORIALS
No ratings yet
Data Structures Interview Questions-Stacks - BALUTUTORIALS
4 pages
Implementation of Singly Linked List
No ratings yet
Implementation of Singly Linked List
57 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Section 1.8: Bin Packing and Scheduling: Math For Liberal Studies
No ratings yet
Section 1.8: Bin Packing and Scheduling: Math For Liberal Studies
76 pages
Data Structures Algorithms - Lecture 15 16 17 - Array Data Structure
No ratings yet
Data Structures Algorithms - Lecture 15 16 17 - Array Data Structure
79 pages
Introduction To Data Structures: CS 202 Minor
No ratings yet
Introduction To Data Structures: CS 202 Minor
16 pages
Reporting VS Analysis
No ratings yet
Reporting VS Analysis
9 pages
D Clustering
No ratings yet
D Clustering
8 pages
Dr. Huma Qayyum Department of Software Engineering Huma - Ayub@uettaxila - Edu.pk
No ratings yet
Dr. Huma Qayyum Department of Software Engineering Huma - Ayub@uettaxila - Edu.pk
20 pages
Synposis FInal 2
No ratings yet
Synposis FInal 2
20 pages
Ass Deepa Mandal CrS2004
No ratings yet
Ass Deepa Mandal CrS2004
6 pages
TAE-1 - Pps I
No ratings yet
TAE-1 - Pps I
2 pages
Ensemble Method
No ratings yet
Ensemble Method
18 pages
Assignment 2 Algorithms and Data Types
No ratings yet
Assignment 2 Algorithms and Data Types
7 pages
5 Logistic Regression
No ratings yet
5 Logistic Regression
48 pages
Multiple Choice Questions Related To Testing Knowledge About Time and Space Complexity of A Program - Tutorial - CodeChef Discuss
No ratings yet
Multiple Choice Questions Related To Testing Knowledge About Time and Space Complexity of A Program - Tutorial - CodeChef Discuss
59 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
Travelling Salesman Problem Using Branch and Bound Approach: Chaitanya Pothineni December 13, 2013
No ratings yet
Travelling Salesman Problem Using Branch and Bound Approach: Chaitanya Pothineni December 13, 2013
8 pages
Design & Analysis of Algorithms CST010
No ratings yet
Design & Analysis of Algorithms CST010
2 pages
Linked Lists: Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, PH.D
No ratings yet
Linked Lists: Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, PH.D
19 pages
Top20 Model Paper
No ratings yet
Top20 Model Paper
17 pages
Mailam Engineering College: Mailam (Po), Villupuram (DT) - Pin: 604 304 Department of Computer Applications
No ratings yet
Mailam Engineering College: Mailam (Po), Villupuram (DT) - Pin: 604 304 Department of Computer Applications
6 pages
Adaptive Resonance Theory - Tutorialspoint
No ratings yet
Adaptive Resonance Theory - Tutorialspoint
3 pages

Ensemble Method

Uploaded by

Ensemble Method

Uploaded by

ENSEMBLE METHODS: INCREASING THE ACCURACY

Classification: classify an unknown sample X

Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)

Two Methods to construct Random Forest:

You might also like