0% found this document useful (0 votes)
230 views20 pages

Improving Classification With AdaBoost

1) AdaBoost is a meta-algorithm that can combine multiple weak classifiers to create a strong classifier with better than random accuracy. 2) It works by sequentially applying weak classifiers to reweighted versions of the data, where misclassified examples get higher weight. This focuses each new classifier on examples missed by previous ones. 3) AdaBoost exponentially decreases classification error over multiple rounds if the weak classifiers perform just slightly better than random guessing. It has been shown to be very effective in practice.

Uploaded by

Hawking Bear
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
230 views20 pages

Improving Classification With AdaBoost

1) AdaBoost is a meta-algorithm that can combine multiple weak classifiers to create a strong classifier with better than random accuracy. 2) It works by sequentially applying weak classifiers to reweighted versions of the data, where misclassified examples get higher weight. This focuses each new classifier on examples missed by previous ones. 3) AdaBoost exponentially decreases classification error over multiple rounds if the weak classifiers perform just slightly better than random guessing. It has been shown to be very effective in practice.

Uploaded by

Hawking Bear
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Improving Classification with the AdaBoost

meta-algorithm

Hawking Bear
February 5, 2022
National Institute of Science Education and Research

1
Problem Statement
Problem Statement

• For a classification problem (assume binary), we are given


a ”weak classifier”.
• Weak classifier - Classifier that performs just slightly
better than random guessing (> 50% accuracy).
• Can we combine multiple instances of the weak classifier
to obtain a strong classifier?

2
Meta-algorithms

• Methods that combine multiple classifiers are called


ensemble methods or meta-algorithms.
• Bagging and boosting are two common types.

3
Bagging and Boosting
Bagging

• Given a dataset X, we randomly sample X (with


replacement) S times to make S new datasets of equal size
as X.
• The weak classifier is applied to each dataset individually.
• To classify a new data point, we apply our S classifiers to
the new data points and take a majority vote.

4
Boosting

• Sequential use of classifiers over T rounds.


• In each subsequent round, the data points that were
misclassified in the previous round are given higher
priority.
• AdaBoost is the most popular boosting algorithm.

5
AdaBoost
AdaBoost

• To demonstrate the algorithms, we’ll use decision stumps


as the weak classifier.
• Decision stumps are decision trees of depth one which
classify data points based on just one feature and one
threshold.

6
AdaBoost

Figure 1: Sample data for decision stumps.

7
AdaBoost Pseudocode

Figure 2: AdaBoost pseudocode [1].


8
AdaBoost Schematic

Figure 3: Schematic representation of AdaBoost.


9
Why this formula for α?

• If α takes the given form and α > 0, it can be shown that


the classification error exponentially decreases over
multiple rounds [2].
• αt ≥ 0 if t ≤ 1/2, which is why we require the weak
classifier to have greater than 50% classification accuracy.

10
Class Imbabalance
What is it?

• Let’s say we’re building a classifier to detect a rare brain


tumor from MRI scans.
• In the dataset, for every positive sample there are 100,000
negative samples.
• A model that seeks to minimize classification error will
perform poorly at detecting cancer patients.

11
How do we detect it?

• Classification error doesn’t cut it, we need alternative


performance metrics.
• Confusion matrix is useful here.

Figure 4: Confusion matrix for a binary classification problem.

12
How do we detect it?

TP
• Precision = TP+FP = fraction of records that were positive
from the group that the classifier predicted to be positive.
TP
• Recall = TP+FN = fraction of positive examples the classifier
got right.
• Very useful when used together.

13
How do we address it?

1. Manipulate the cost matrix.


2. Resample during training.

Figure 5: Typical (top) and


modified (bottom) cost
matrices.

14
References i

1. Freund, Y., Schapire, R. & Abe, N. A short introduction to


boosting. Journal-Japanese Society For Artificial
Intelligence 14, 1612 (1999).
2. Freund, Y. & Schapire, R. E. A Decision-Theoretic
Generalization of On-Line Learning and an Application to
Boosting. Journal of Computer and System Sciences 55,
119–139. issn: 0022-0000.
https://www.sciencedirect.com/science/
article/pii/S002200009791504X (1997).

15
Why the name?

1
• Let the training error t of ht be given by 2 − γt .
• Previous learning algorithms required that γt be known a
priori before boosting begins.
• AdaBoost adapts to the error rates of the individual weak
hypotheses, thus the name ’adaptive’.

You might also like