0% found this document useful (0 votes)
40 views24 pages

Artificial Intelligence Fundamentals: Learning: Boosting

This document discusses boosting algorithms for machine learning. It introduces the concept of combining multiple weak classifiers to create a strong classifier by having them vote. The key ideas are: (1) training classifiers on exaggerated versions of previous errors to reduce overlap, (2) weighting votes of classifiers, (3) minimizing error at each step to determine weight updates, and (4) stopping when all samples are correctly classified or no weak classifier remains. AdaBoost is presented as optimizing this process to exponentially decrease error over time. Face detection using Haar-like features and integral images is also briefly covered.

Uploaded by

alexandra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views24 pages

Artificial Intelligence Fundamentals: Learning: Boosting

This document discusses boosting algorithms for machine learning. It introduces the concept of combining multiple weak classifiers to create a strong classifier by having them vote. The key ideas are: (1) training classifiers on exaggerated versions of previous errors to reduce overlap, (2) weighting votes of classifiers, (3) minimizing error at each step to determine weight updates, and (4) stopping when all samples are correctly classified or no weak classifier remains. AdaBoost is presented as optimizing this process to exponentially decrease error over time. Face detection using Haar-like features and integral images is also briefly covered.

Uploaded by

alexandra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Artificial Intelligence

Fundamentals

Learning: Boosting
• Binary classification – classify the elements of a
given set into two groups on a given rule

• Finding the classification rule can be a difficult


task

• A crowd can be smarter than the participant in


the crowd?
Classifiers strong/weak
• Suppose we have a set of classifiers
h which give as the output {-1 , +1}

• Error rate

0 0.5 1

Strong classifier Weak classifier

• Can we make a strong classifier by combining several of


these weak classifiers and let them vote?
H ( x ) sign ( h1 ( x ) + h 2 ( x ) + ... + h n ( x ) )
= x - is a sample
The perfect classifiers
( x ) sign ( h1 ( x ) + h 2 ( x ) + h3 ( x ) )
H=

h2 is wrong
h1 is wrong

h3 is wrong

• If it’s look like this, always we will have 0 error


A real situation

h2 is wrong
h1 is wrong

h3 is wrong

• Is the area overlapping by at least 2 classifiers sufficiently smaller


than the area covers by each individual tests for wrong cases?
Idea #1
• We use undisturbed DATA to produce h1
• We use DATA with an exaggeration of h1 errors
(disturbed set of data) to produce h2
• We use DATA with an exaggeration of data
where h1 give a different answer than h2 to
produce h3
Idea #2
H(x)

h1 h2 h3

h11 h12 h13 h21 h22 h23 h31 h32 h33

• Get out the vote


Idea #3 – Example of classifiers

For each horizontal stump we


could have 2
cases:
Up + , Down –
or
Up - , Down +

Similar for vertical stumps:


Left – Right.

Everything it’s a + or everything it’s a - . It’s an extra test.


• Decision tree stumps – a single test
• Could be 12 decision tree stumps: for each dimension we have # of
lines * 2 (we have 2 dimensions: 2*3*2=12
1
Error = ∑
WRONG N
CASES
ω3
N − # of cases

ω2

In the beginning:
ω1 1
ωi1 =
N

• Weighted the samples Error t = ∑


i −WRONG
ωit
CASES

• Enforced a distribution ∑
ALL
ωit = 1
CASES
Idea #4
( x ) sign (α 1h1 ( x ) + α 2 h 2 ( x ) + α 3 h3 ( x ) + ....)
H=

• Build a classifier in multiple steps


• We don’t treat equally each one on the crowd
-> wisdom of weighted crowd of experts
Idea #5

1
LET ωi1 = , where N - # of samples
N

Pick ht that
Pick α t
minimizes ERROR t

Calculate ω t +1
Idea #6
ω t
−α t ht ( x ) y ( x )
• Suppose that: ω t +1
i = i
e
Z
 +1 for samples the classifier thinks belongs to the class
h ( x) = 
−1 for samples that the classifier thinks do not belong to the class

y ( x) ∈ {+1, −1} - the desired output

Z - the normalizer, in order to have a distribution


Minimize the error
• The error BOUND is minimized for the whole #4 if:

1 1 − E t
α t = ln t , where E is the ERROR at time t
2 E
• The error will be bounded by an exponential decay function
• It’s guaranteed to converge on 0

Error

Exponential boundary

Error

time
Ada Boost
• You use uniform weights to start.
• For each step, you find the classifier that yields the lowest error
rate for the current weights, wit
• You use that best classifier, ht(xi) , to compute the error rate
associated with the step, Et
• You determine the alpha for the step, αt from the error for the step,
Et .
• With the alpha in hand, you compute the weights for the next step,
wit+1, from the weights for the current step, wit , taking care to
include a normalizing factor, Zt , so that the new weights add up to
1.
• You stop successfully when H(xi) correctly classifies all the samples,
xi; you stop unsuccessfully if you reach a point where there is no
weak classifier, one with an error rate < 1/2 .
Change the weights
 Et
t  if it's correct
ωi  1 − E T
ωit +1 = 
Z  1− Et
 if it's wrong
 ET
Et 1− Et
=Z
1− Et

CORRECT
ω +
t
i ∑
E t WRONG
ωi
t

Et 1 − E t
= (1 − E t ) + E=
t
2 E t (1 − E t )
1− E E t

 1 1 1 1
t 
ωi 1 − E t
if it's correct = ∑ ω t +1
i =
− t ∑ ω t

ωit +1 =  CORRECT 2 1 E CORRECT 2


2  1 1
 E t
if it's wrong ∑
WRONG
ω t +1
i =
2
Improvements

• Tests that really matter


• Immune to overfitting
Face detection
• Haar-like features
– Edge features

– Line features

– Other features
Face detection
• Integral image – each pixel (x,y) is the sum of all pixels
above and left of x,y applied to original image

A B
1 2

(x,y)
C D
3 4

= ( location _1)
v1 ii= ∑i
ii ( x, y ) = ∑
x '≤ x , y '≤ y
i ( x ', y '),
v 2 ii ( location _=
= 2)
A

∑i + ∑i
A B

where ii ( x, y ) − integral image v3 ii ( location _=


= 3) ∑i + ∑i
A C

i ( x, y ) − original image v 4 = ii ( location _ 4 ) = ∑i + ∑i + ∑i + ∑i


A B C D

rect ( D) = v1 + v 4 − v 2 − v3
Face detection
• For 24x24 pixels image – 162.336 features
Face detection
• Choosing the threshold for each classifier
Face detection
• The first and second features selected by
AdaBoost
Face detection
Starts from top-left
• Sub-window – 24x24 pixels corner and go to
left 1 pixel at a
width time. When reach
the end of the row,
go down 1 pixel
and start again
24 from the left.

24

height
Face detection
• Cascade of classifier
– #of features in the first 5 layers: 1, 10, 25, 25 and 50
– total # of features in all layers - 6061

All
T T T T
sub-windows 1 2 3 38 Face

F F F F
Reject sub-window
Related resources
• P. Viola, M. Jones, “Robust Real-Time Face Detection”,
http://www.vision.caltech.edu/html-files/EE148-2005-
Spring/pprs/viola04ijcv.pdf

You might also like