0% found this document useful (0 votes)

63 views15 pages

Combining Classifiers: Outline

This document discusses different techniques for combining multiple classifiers to improve classification accuracy. It covers major topics like majority voting, weighted voting, bagging, boosting and different fusion methods. Majority voting assigns the class with the most votes from the classifiers. Weighted voting assigns weights to classifiers based on accuracy. Bagging trains classifiers on bootstrap samples to add diversity. Boosting iteratively reweights training examples to focus on misclassified ones. The document compares various combination methods and their effects on accuracy and classifier independence.

Uploaded by

jawad_shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views15 pages

Combining Classifiers: Outline

Uploaded by

jawad_shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Combining Classifiers

Outline
Types of Classifier Outputs Fusion of Label Outputs Fusion of Continuous-Values Outputs Classifier selection Bagging Boosting

Types of Classifier Outputs

Type 0: The Oracle level Type 1: The Abstract level Type 2: The Rank level Type 3: The Measurement level

Fusion of Label Outputs

Majority voting Weighted voting Nave Bayes Combination Behavior Knowledge Space and Werneckes Method Dempster-Shafer Theory, Probabilistic Approximation, Classifier Combination Using Singular Value Decomposition, etc

Majority Voting
Unanimity, simple majority and plurality Assume that the label outputs of the classifiers are given as c-dimensional binary vectors [di,1, . . . , di,c]T {0, 1}c, i=1, . . . , L, where di, j = 1 if Di labels x in j, and 0 otherwise.
The plurality vote (also known as majority vote) will result in an ensemble decision for class k if:

Results about Majority Voting

Assuming that:
The number of classifiers, L, is odd The probability for each classifier to give the correct class label is p for any x n The classifier outputs are independent

The accuracy of the ensemble is:

Results about Majority Voting

If p > 0.5, then Pmaj is monotonically increasing and
Pmaj 1 as L Pmaj 0 as L

If p < 0.5, then Pmaj is monotonically decreasing and If p = 0.5

then Pmaj = 0.5 for any L.

Warning: Patterns of Success and Failure

Weighted Voting
If the classifiers in the ensemble are not of identical accuracy, it is reasonable to attempt to give the more competent classifiers more power in making the final decision. The label outputs can be represented as degrees of support for the classe in the following way

The discriminant function for class j obtained through weighted voting is

where bi is a coefficient for classifier Di.

Selecting weights
One way to select the weights for the classifiers
Consider an ensemble of L independent classifiers D1, . . . , DL, with individual accuracies p1, . . . , pL. The outputs are combined by the weighted majority vote The accuracy of the ensemble is maximized by assigning weights

Nave Bayes Combination

It assumes that the classifiers are mutually independent given a class label (conditional independence)

So the support for class k can be calculated as:

Practical implementation
For each classifier Di, a c c confusion matrix CMi is calculated by applying Di to the training data set. The (k,s)th entry of this matrix, is the number of elements of the data set whose true class label was k, and were assigned by Di to class s. By Ns we denote the total number of elements of Z from class s. Taking as an estimate of the probability P(si|k), and Nk/N as an estimate of the prior probability for class k, the support for the class k is equivalent to

BKS & Werneckes Method

Behavior knowledge space (BKS) is a fancy name for the multinomial combination (no independence assumption)
The label vector s gives an index to a cell in a look-up table (the BKS table) The table is designed using a labeled data set Z. Each zj Z is placed in the cell indexed by the s for that object. The number of elements in each cell are tallied and the most representative class label is selected for this cell. The highest score corresponds to the highest estimated posterior probability. Ties are resolved arbitrarily. Empty cells are labelled in some appropriate way. For example, we can choose a label at random or use the result from a majority vote between the elements of s.

Werneckes combination method is similar to BKS and aims at reducing overtraining.

It also uses a look-up table with labels. The difference is that in constructing the table, Wernecke considers the 95 percent confidence intervals of the frequencies in each cell. If there is overlap between the intervals, the prevailing class is not considered dominating enough for labeling the cell. The least wrong classifier among the L members of the team is identified and it is authorized to label the cell.

An Example
Results from a 10-fold crossvalidation with the Pima Indian Diabetes database from UCI. Each individual classifier is an MLP with one hidden layer with 25 nodes. o is for the training, and for the testing

Fusion of Continuous-Values Outputs

Class-Conscious Combiners
Nontrainable Combiners Trainable Combiners

Class-Indifferent Combiners
Decision Templates DempsterShafer Combination

Decision Profile

Nontrainable (Class-Conscious) Combiners

The average and the product are the two most intensively studied combiners. Yet, there is no guideline as to which one is better for a specific problem.

The current understanding is that the average, in general, might be less accurate than the product for some problems but is the more stable of the two

Trainable Combiners
The Weighted Average
Three groups can be distinguished based on the number of weights
L weights.
There is one weight per classifier. The weight for classifier Di is usually based on its estimated error rate.

c L weights
Weights are specific for each class

c c L weights.
The support for each class is obtained by a linear combination of all of the decision profile DP(x)

Fuzzy Integrals

Decision Templates

Classifier selection
Decision-Independent Estimates Decision-Dependent Estimates Selection or Fusion?
the Kunchevas proposal

Some Selection Criteria

Decision-Independent Estimates (e.g. Direct k-nn Estimate) One way to estimate the competence is to identify the K nearest neighbors of x from either the training set or a validation set and find out how accurate the classifiers are on these K objects. K is a parameter of the algorithm, which needs to be tuned prior to the operational phase. Decision-Dependent Estimates (e.g. Direct k-nn Estimate) Let si be the class label assigned to x by classifier Di. Denote by Nx(si) the set of K nearest neighbors of x from Z, which classifier Di labeled as si. The competence of classifier Di for the given x, is calculated as the proportion of elements of Nx(si) whose true class label was si. This estimate is called the local class accuracy.

Kunchevas approach
Selection is guaranteed by design to give at least the same training accuracy as the best individual classifier D*. However, the model might overtrain To guard against overtraining we may use a statistical significance test (confidence intervals) and nominate a classifier only when it is significantly better than the others. If Di(j) is significantly better than the second best in the region Rj, then Di(j) can be nominated as the classifier responsible for Rj. Otherwise, a scheme involving more than one classifier might pay off.

Bootstrap AGGregatING
Training phase 1. Initialize the parameters D = , the ensemble. L, the number of classifiers to train. 2. For k = 1, . . . , L Take a bootstrap sample Sk from Z. Build a classifier Dk using Sk as the training set Add the classifier to the current ensemble, D = D Dk 3. Return D. Classification phase 4. Run D1.. . . ,DL on the input x. 5. The class with the maximum number of votes is chosen as the label for x.

Why Does Bagging Work?

If classifier outputs were independent and classifiers had the same individual accuracy p, then the majority vote is guaranteed to improve on the individual performance Bagging aims at developing independent classifiers by taking bootstrap replicates as the training sets. The samples are pseudo-independent because they are taken from the same Z. Warning: even if they were drawn independently from the distribution of the problem, the classifiers built on these training sets might not give independent outputs.

Example: Independent Samples, Bootstrap Samples, and Bagging

Bagging for the rotated check-board data using bootstrap samples and independent samples: (a) averaged pairwise correlation versus the ensemble size; (b) testing error rate versus the ensemble size.

Boosting (AdaBoost.M1)
Training phase (1/2) 1. Initialize the parameters Set the weights (Usually ) Initialize the ensemble D = Pick L, the number of classifiers to train. 2. For k = 1, . . . , L Take a sample Sk from Z using distribution wk Build a classifier Dk using Sk as the training set. Calculate the weighted ensemble error at step k by

(lkj = 1 if Dk misclassifies zj and lkj =0 otherwise)

Boosting (AdaBoost.M1)
Training phase (2/2) If k = 0 or k 0.5, ignore Dk; reinitialize the weights wkj to 1/N and continue Else, calculate

Update the individual weights

3. Return D and 1,,L

Boosting (AdaBoost.M1)
Classification phase 4. Calculate the support for class t by

5. The class with the maximum support is chosen as the label for x.

Why Does AdaBoost Work?

The Margin Theory (remember SVM)

all objects that are misclassified will have negative margins, and those correctly classified will have positive margins Margin distribution graphs for bagging and AdaBoost for the rotated check-board

Conclusions
In his invited lecture at the 3rd International Workshop on Multiple Classifier Systems, 2002, Ghosh proposes that: . . . our current understanding of ensemble-type multiclassifier systems is now quite mature . . . And yet, in an invited book chapter, the same year, Ho states that: . . . Many of the above questions are there because we do not yet have a scientific understanding of the classifier combination mechanisms. The area of combining classifiers is very dynamic and active at present, and is likely to grow and expand in the nearest future.

ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Gockenbach-Partial Differential Equations
No ratings yet
Gockenbach-Partial Differential Equations
639 pages
cz4041 9 Ensemble
No ratings yet
cz4041 9 Ensemble
54 pages
04 EnsembleLearning
No ratings yet
04 EnsembleLearning
40 pages
CH 7 - Ensemble Learning and Random Forests
No ratings yet
CH 7 - Ensemble Learning and Random Forests
78 pages
Unit 3
No ratings yet
Unit 3
99 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Class 10 Substitution Method: Choose Correct Answer(s) From The Given Choices
No ratings yet
Class 10 Substitution Method: Choose Correct Answer(s) From The Given Choices
14 pages
Solved CBSE XII Maths (EF1GH-5)
No ratings yet
Solved CBSE XII Maths (EF1GH-5)
22 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Lecture 2
No ratings yet
Lecture 2
35 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Plane Stress Vs Strain
100% (1)
Plane Stress Vs Strain
53 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Unsupervised Evaluation and Weighted Aggregation of Ranked Predictions
No ratings yet
Unsupervised Evaluation and Weighted Aggregation of Ranked Predictions
31 pages
Nda Math Mock Test 1802
No ratings yet
Nda Math Mock Test 1802
11 pages
Finite Difference and Interpolation PDF
No ratings yet
Finite Difference and Interpolation PDF
29 pages
Unit IV Aiml
No ratings yet
Unit IV Aiml
32 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Boosting
No ratings yet
Boosting
28 pages
Lecture Slide 12
No ratings yet
Lecture Slide 12
22 pages
Unit 4 AIML
No ratings yet
Unit 4 AIML
29 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Unit 3
No ratings yet
Unit 3
59 pages
ML8 Ensembles
No ratings yet
ML8 Ensembles
31 pages
05 Ensemble Learning
No ratings yet
05 Ensemble Learning
38 pages
VTU Module-4 Chapter-2 Ensemble Learning and Random Forests
No ratings yet
VTU Module-4 Chapter-2 Ensemble Learning and Random Forests
61 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Unit 3
No ratings yet
Unit 3
63 pages
2025 Ensemble Learning
No ratings yet
2025 Ensemble Learning
25 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
10 MiddleWare
No ratings yet
10 MiddleWare
23 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Ensemble Methods (Final)
No ratings yet
Ensemble Methods (Final)
16 pages
Lecture 17 - Ensemble Learning
No ratings yet
Lecture 17 - Ensemble Learning
31 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
Unit 5 ML
No ratings yet
Unit 5 ML
14 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
1 s2.0 S0167404816301572 Main
No ratings yet
1 s2.0 S0167404816301572 Main
18 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Ensembles Learning
No ratings yet
Ensembles Learning
16 pages
bk9 10
No ratings yet
bk9 10
28 pages
Ensemble Learning
No ratings yet
Ensemble Learning
30 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Partial Differential Equation MCQ For M.Sc. From T. Amaranatj
100% (6)
Partial Differential Equation MCQ For M.Sc. From T. Amaranatj
10 pages
C.V. Raman Global University: Bidyanagar, Mahura, Janla, Bhubaneswar, Odisha-752054
100% (1)
C.V. Raman Global University: Bidyanagar, Mahura, Janla, Bhubaneswar, Odisha-752054
30 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
SRS The Adventure Island Android Application
No ratings yet
SRS The Adventure Island Android Application
15 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Ensemble Methods For Classifiers: Department of Industrial Engineering Tel-Aviv University
No ratings yet
Ensemble Methods For Classifiers: Department of Industrial Engineering Tel-Aviv University
24 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
Quadratic Equations HB
No ratings yet
Quadratic Equations HB
4 pages
DSSSB PGT Maths Female Paper 2021 (Nitin Sir)
No ratings yet
DSSSB PGT Maths Female Paper 2021 (Nitin Sir)
68 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Limits Introduction
No ratings yet
Limits Introduction
25 pages
18.335 Problem Set 2 Solutions
No ratings yet
18.335 Problem Set 2 Solutions
7 pages
Names of Reciprocals of Large Numbers
No ratings yet
Names of Reciprocals of Large Numbers
5 pages
Nonlinear Functional Analysis: Gerald Teschl
No ratings yet
Nonlinear Functional Analysis: Gerald Teschl
74 pages
Matlab Programmming Previous Papers
No ratings yet
Matlab Programmming Previous Papers
4 pages
05-Data Handling
No ratings yet
05-Data Handling
19 pages
Tutorial
100% (1)
Tutorial
192 pages
Chapter 10 FacilitiesLocationB
No ratings yet
Chapter 10 FacilitiesLocationB
10 pages
3D PDF
No ratings yet
3D PDF
12 pages
Methods of Differentiation
No ratings yet
Methods of Differentiation
12 pages
TopicsInAnalysis2 TKSM
No ratings yet
TopicsInAnalysis2 TKSM
20 pages
Homework 2
No ratings yet
Homework 2
4 pages
10 - PA2 Maths 24
No ratings yet
10 - PA2 Maths 24
3 pages
Curriculum Vitae, Vivek Dhand, 1
No ratings yet
Curriculum Vitae, Vivek Dhand, 1
3 pages
SDCI Data Improvement: Sakai Research Edition - Human Communications As Part of The Scientific Record
No ratings yet
SDCI Data Improvement: Sakai Research Edition - Human Communications As Part of The Scientific Record
14 pages
SDCI Data Improvement: Sakai Research Edition - Human Communications As Part of The Scientific Record
No ratings yet
SDCI Data Improvement: Sakai Research Edition - Human Communications As Part of The Scientific Record
14 pages
A One-Dimensional Random Walk Model For Polymer Chains
No ratings yet
A One-Dimensional Random Walk Model For Polymer Chains
5 pages
Limit
No ratings yet
Limit
11 pages
Forums Changes
No ratings yet
Forums Changes
3 pages
Finding GCD, LCM, Partial Number
No ratings yet
Finding GCD, LCM, Partial Number
3 pages
Unit 1 Outline pc11
No ratings yet
Unit 1 Outline pc11
1 page
Assignment 2
No ratings yet
Assignment 2
2 pages