Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
Introduction
Bayes Decision
Theory
Discriminant
Functions and
Decision
Surfaces patterns feature feature classifier system
sensor
generation selection design evaluation
Bayesian
↑
Classification
for Normal
Distributions
Estimation of
Unknown
Probability
Density
Functions
Overview
Introduction
1 Introduction
Bayes Decision
Theory
Estimation of
4 Bayesian Classification for Normal Distributions
Unknown
Probability
Density
Functions 5 Estimation of Unknown Probability Density Functions
Overview
Introduction
1 Introduction
Bayes Decision
Theory
Estimation of
4 Bayesian Classification for Normal Distributions
Unknown
Probability
Density
Functions 5 Estimation of Unknown Probability Density Functions
Statistical Classification - Problem Statement
Introduction
Probability P
Bayes Decision is a real number describing an event belonging to the
Theory
range < 0, 1 >.
Discriminant
Functions and
Decision
Surfaces
Density p
Bayesian
Classification is a value of a function1 p(x) describing the distribution of
for Normal
Distributions the random variable x.
Estimation of
Unknown
Probability
Density
If the random variable takes only discrete values, the
Functions
densities become probabilities!
1
This function is often referred as pdf - probability density function.
Overview
Introduction
1 Introduction
Bayes Decision
Theory
Estimation of
4 Bayesian Classification for Normal Distributions
Unknown
Probability
Density
Functions 5 Estimation of Unknown Probability Density Functions
A Priori Probability vs. A Posteriori Probability
Bayesian
Classification
for Normal A posteriori probability - probability after classification
Distributions
Estimation of
• How probable is a particular class ωi for a pattern x after
Unknown
Probability
applying a statistical classification algorithm?
Density
Functions • Answer: P(ωi |x)
Likelihood Density Function
Introduction
Bayes Decision
Theory
Likelihood Density Function
Discriminant
Functions and • How feature vectors x are distributed in a class ωi ?
Decision
Surfaces
• Answer: p(x|ωi )
Bayesian
Classification • p(x|ωi ) is the likelihood function of ωi with respect to x
for Normal
Distributions
• p(x|ωi ) can be trained from examples
Estimation of
Unknown
Probability
Density
Functions
Bayes Decision Theory for a Two-Class Problem
Known
Introduction
Classes: {ω1 , ω2 }
Bayes Decision
Theory A priori probabilities: P(ω1 ) and P(ω2 )
Discriminant Likelihood density functions: p(x|ω1 ) and p(x|ω2 )
Functions and
Decision Pattern to be classified: x = [x1 , x2 , . . . , xl ]T
Surfaces
Bayesian
Classification Assumption
for Normal
Distributions The feature vectors can take any value in the
Estimation of
Unknown
l -dimensional feature space: x = [x1 , x2 , . . . , xl ]T ∈ IRl
Probability
Density
Functions
Unknown
A posteriori probabilities: P(ω1 |x) and P(ω2 |x)
Computation of the A Posteriori Probability
Introduction
Bayes Decision
Theory Using the Bayes Rule
Discriminant
Functions and
Decision p(x|ωi )P(ωi )
Surfaces P(ωi |x) = i = 1, 2 (1)
p(x)
Bayesian
Classification
for Normal
Distributions
Introduction
Bayes Decision
Theory
Introduction
Bayes Decision
Theory
Discriminant
Considering the Bayes Rule (Eq. 1)
Functions and
Decision
p(x|ω1 )P(ω1 ) p(x|ω2 )P(ω2 )
Surfaces If p(x) > p(x) , x is classified to ω1
Bayesian
Classification
for Normal p(x|ω1 )P(ω1 ) p(x|ω2 )P(ω2 )
Distributions If p(x) < p(x) , x is classified to ω2
Estimation of
Unknown
Probability
Density
Functions
Bayes Classification Rule (3)
Introduction
Bayes Decision
Theory
p(x) can be disregarded, because it is the same for all
Discriminant
Functions and classes
Decision
Surfaces
Estimation of
Unknown We are done, since the likelihood density functions
Probability
Density p(x|ω1 ) and p(x|ω2 ) are assumed to have been trained
Functions
from examples!
Classification Error Probability
p(x|v) p(x|v1)
p(x|v2)
Introduction
Bayes Decision
Theory
Discriminant
Functions and
Decision
Surfaces
Bayesian
Classification
for Normal
Distributions
Estimation of
Unknown x0 x
Probability R1 R2
Density
Functions
1
Rx0 1
R∞
Error Probability: Pe = 2 p(x|ω2 )dx + 2 p(x|ω1 )dx
−∞ x0
Classification Error Probability in General
Introduction
• A priori probabilities are not equal: P(ω1 ) 6= P(ω2 )
Bayes Decision
Theory
• Feature vectors have more than one dimension: l > 1
Discriminant
Functions and
Decision
Surfaces
x = [x1 , x2 , . . . , xl ]T
Bayesian
Classification • General form:
for Normal
Distributions Z Z
Estimation of Pe = P(ω1 ) p(x|ω1 )dx + P(ω2 ) p(x|ω2 )dx
Unknown
Probability
Density
R2 R1
Functions
Classification Error Probability
Introduction
Bayes Decision
Theory
Discriminant
Functions and
Decision
Surfaces
Bayesian
Classification
for Normal
Distributions
Estimation of
Unknown
Probability
Density
Functions
Introduction
Bayes Decision
Theory
Discriminant
If the a priori probabilities are equal: P(ω1 ) = P(ω2 )
Functions and
Decision
Surfaces
If p(x|ω2 ) > p(x|ω1 ) λλ12
21
, x is classified to ω2
Bayesian
Classification
for Normal
Distributions If p(x|ω2 ) < p(x|ω1 ) λλ21
12
, x is classified to ω1
Estimation of
Unknown
Probability
Density
Functions
Overview
Introduction
1 Introduction
Bayes Decision
Theory
Estimation of
4 Bayesian Classification for Normal Distributions
Unknown
Probability
Density
Functions 5 Estimation of Unknown Probability Density Functions
Discriminant Functions
Bayes Decision
Theory gi (x) ≡ f (P(ωi |x))
Discriminant
Functions and • f (·) is a monotonically increasing function
Decision
Surfaces
• gi (x) is known as discriminant function
Bayesian
Classification • The decision test is now stated as
for Normal
Distributions
Introduction
1 Introduction
Bayes Decision
Theory
Estimation of
4 Bayesian Classification for Normal Distributions
Unknown
Probability
Density
Functions 5 Estimation of Unknown Probability Density Functions
Assumption
Bayesian
Classification
for Normal
Distributions
Estimation of
Unknown • This “monster” will be denoted by
Probability
Density
Functions
p(x|ωi ) = N (µi , Σi ) i = 1, 2, . . . , M
Discriminant Function f (·) = ln(·)
Discriminant
Functions and gi (x) = ln(p(x|ωi )P(ωi )) = ln p(x|ωi ) + ln P(ωi )
Decision
Surfaces
Bayesian
m considering the “monster”
Classification
for Normal
1
Distributions
gi (x) = − (x − µi )T Σi−1 (x − µi ) + ln P(ωi ) + ci (2)
Estimation of 2
Unknown
Probability
Density
Functions
• Where: ci = − 2l ln 2π − 1
2 ln |Σi |
Quadrics as Decision Curves
Bayes Decision
Theory
are quadrics (i. e., ellipsoids, parabolas, hyperbolas, pairs of
Discriminant
lines)
Functions and
Decision
Surfaces
x2 x2
Bayesian
Classification 3 4
for Normal
v2
Distributions
Estimation of v1 1
v1 v1
Unknown 0
Probability
Density 22 v2
Functions
23
25
23 22 21 0 x1 210 25 0 5 x1
(a) (b)
Decision Hyperplanes
Discriminant
discriminant functions
Functions and
Decision • Thus, the quadric term can be disregarded by decision
Surfaces
surface equations. The same is true for the constant ci
Bayesian
Classification • The simplified version of the discriminant function is just a
for Normal
Distributions linear function
Estimation of
Unknown
gi (x) = wi T x + wi 0
Probability
Density where
Functions
1
wi = Σ −1 µi and wi 0 = ln P(ωi ) − µi T Σ −1 µi
2
Minimum Distance Classifiers
Estimation of
Unknown
• Thus, a feature vector x is assigned to a class bi according
Probability
Density
to its Euclidean distance to the respective mean points µi
Functions
bi = argmax(gi (x)) = argmin(||x − µi ||)
i i
Remarks
Introduction
1 Introduction
Bayes Decision
Theory
Estimation of
4 Bayesian Classification for Normal Distributions
Unknown
Probability
Density
Functions 5 Estimation of Unknown Probability Density Functions
Problem Statement
Bayesian
likelihood density functions have to be estimated from the
Classification
for Normal
available training data.
Distributions
Estimation of
Unknown • Here, two estimation methods will be considered, namely
Probability
Density • Maximum Likelihood Parameter Estimation
Functions
• Maximum a Posteriori Probability Estimation
Maximum Likelihood Parameter Estimation (1)
Bayes Decision
distributed according to p(x|ωi ), i = 1, 2, . . . , M.
Theory
• The likelihood functions are assumed to be given in a
Discriminant
Functions and parametric form. The statistical parameters for the classes
Decision
Surfaces ωi form vectors θi which are unknown
Bayesian
Classification
for Normal
p(x|ωi ) = p(x|ωi ; θi )
Distributions
Discriminant
feature vectors, we can form the joint density function
Functions and
Decision N
Surfaces Y
Bayesian
p(X ; θ) = p(x1 , x2 , . . . , xN ; θ) = p(xk ; θ)
Classification k=1
for Normal
Distributions
Estimation of
• The ML method estimates θ so that the likelihood
Unknown
Probability
function takes its maximum value
Density
Functions N
Y
θbML = argmax p(xk ; θ)
θ k=1
Maximum Likelihood Parameter Estimation (3)
Bayes Decision =0
Theory ∂θ
Discriminant
Functions and
• Due to the monotonicity of the logarithmic function, we
Decision
Surfaces
can use also the log-likelihood function
Bayesian N
Classification Y
for Normal
Distributions
L(θ) = ln p(xk ; θ)
Estimation of
k=1
Unknown
Probability • Looking for the maximum here, we have
Density
Functions
N N
∂L(θ) X ∂ ln p(xk ; θ) X 1 ∂p(xk ; θ)
= = =0
∂θ ∂θ p(xk ; θ) ∂θ
k=1 k=1
Maximum a Posteriori Probability Estimation