0% found this document useful (0 votes)

78 views76 pages

Sergios Theodoridis Konstantinos Koutroumbas

This document discusses pattern recognition and classifiers based on Bayes decision theory. [1] It outlines typical applications of pattern recognition like machine vision, character recognition, and biometrics. [2] It then explains that patterns are classified based on measurable feature values and discusses feature vectors. [3] The rest of the document focuses on classifiers based on Bayes decision theory, including computing posterior probabilities, Bayes classification rules, discriminant functions, decision surfaces, and a Bayesian classifier for normal distributions.

Uploaded by

Ishwar Mht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views76 pages

Sergios Theodoridis Konstantinos Koutroumbas

Uploaded by

Ishwar Mht

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 76

Sergios Theodoridis

Konstantinos Koutroumbas

1
Version 2
PATTERN RECOGNITION
 Typical application areas
 Machine vision
 Character recognition (OCR)
 Computer aided diagnosis
 Speech recognition
 Face recognition
 Biometrics
 Image Data Base retrieval
 Data mining
 Bionformatics

 The task: Assign unknown objects – patterns – into the correct

class. This is known as classification.

2
 Features: These are measurable quantities obtained from
the patterns, and the classification task is based on their
respective values.

Feature vectors: A number of features

x1 ,..., xl ,
constitute the feature vector
x  x1 ,..., xl   R l
T

Feature vectors are treated as random vectors.

3
An example:

4
 The classifier consists of a set of functions, whose values,
computed at x , determine the class to which the
corresponding pattern belongs

 Classification system overview

Patterns
sensor

feature
generation

feature
selection

classifier
design

system
evaluation
5
 Supervised – unsupervised pattern recognition:
The two major directions
 Supervised: Patterns whose class is known a-priori
are used for training.
 Unsupervised: The number of classes is (in general)
unknown and no training patterns are available.

6
CLASSIFIERS BASED ON BAYES DECISION
THEORY

 Statistical nature of feature vectors

x  x1 , x 2 ,..., xl 
T

 Assign the pattern represented by feature vector x

to the most probable of the available classes
 1 ,  2 ,...,  M

That is x   i : P ( i x )
maximum

7
 Computation of a-posteriori probabilities
 Assume known
• a-priori probabilities
P(1 ), P (2 )..., P (M )
• p ( x i ), i  1,2...M

This is also known as the likelihood of

x w.r . to  i .

8
 The Bayes rule (Μ=2)

p ( x ) P ( i x )  p ( x  i ) P ( i ) 
p ( x  i ) P ( i )
P ( i x ) 
p( x)

where 2
p( x)  i 1
p ( x  i ) P ( i )

9
 The Bayes classification rule (for two classes M=2)
 Given x classify it according to the rule

If P(1 x )  P(2 x ) x  1
If P(2 x )  P(1 x ) x  2

 Equivalently: classify x according to the rule

p ( x 1 ) P (1 )(  ) p ( x 2 ) P (2 )

 For equiprobable classes the test becomes

p ( x  1 ) (  ) P ( x  2 )
10
R1 ( 1 ) and R2 ( 2 )
11
 Equivalently in words: Divide space in two regions

If x  R1  x in 1
If x  R2  x in  2

 Probability of error
 Total shaded area
x0 
P
e  

p( x  2 )dx   p
x0
( x  1 )dx

 Bayesian classifier is OPTIMAL with respect to

minimising the classification error probability!!!!
12
 Indeed: Moving the threshold the total shaded
area INCREASES by the extra “grey” area.
14
 The Bayes classification rule for many (M>2) classes:
 Given x classify it to  i if:

P(i x)  P( j x) j  i

 Such a choice also minimizes the classification error

probability

 Minimizing the average risk

 For each wrong decision, a penalty term is assigned since
some decisions are more sensitive than others

15
 For M=2
• Define the loss matrix
11 12
L( )
21 22

• 12 penalty term for deciding class  2

,
although the pattern belongs to 1 , etc.

 Risk with respect to 1

r1  11  p(x1)d x  12  p(x1)d x

R1 R2

16
 Risk with respect to 2
r2  21  p( x 2 )d x  22  p( x 2 )d x
R1 R2

  Probabilities of wrong decisions,

weighted by the penalty terms

 Average risk

r  r1 P (1 )  r2 P ( 2 )
17
 Choose R1 and R2 so that r is minimized

 Then assign x to  i if
l1  11 p( x 1 )P(1 )  21 p( x 2 ) P(2 )
l 2  12 p( x 1 )P(1 )  22 p( x 2 ) P(2 )
 Equivalently:
assign x in 1 ( 2 ) if
p( x 1 ) P(2 ) 21  22
l 12   ( )
p ( x 2 ) P(1 ) 12  11

l 12 : likelihood ratio

18
 If 1
P(1 )  P(2 )  and 11  22  0
2
21
x  1 if P( x 1 )  P( x 2 )
12
12
x  2 if P( x 2 )  P( x 1 )
21
if 21  12  Minimum classification
error probability

19
 An example:
1
 p( x 1 )  exp( x )
2


1
 p( x 2 )  exp(( x  1) 2 )

1
 P(1 )  P(2 ) 
2
 0 0 .5 
 L   
 1 .0 0 

20
 Then the threshold value is:
x0 for minimum Pe :
x0 : exp( x )  exp(( x  1) ) 
2 2

1
x0 
2
 Threshold x̂0 for minimum r
xˆ0 : exp( x )  2 exp(( x  1) ) 
2 2

(1  ln2) 1
xˆ0  
2 2
21
1
Thus x̂0 moves to the left of  x0
(WHY?) 2

22
DISCRIMINANT FUNCTIONS
DECISION SURFACES
 If Ri , R j are contiguous: g(x)  P(i x)  P(j x)  0
Ri : P(i x)  P ( j x)
+
- g ( x)  0
R j : P( j x)  P (i x)

is the surface separating the regions. On one side is

positive (+), on the other is negative (-). It is known
as Decision Surface

23
 If f(.) monotonic, the rule remains the same if we use:

x i if : f (P(i x))  f (P(j x)) i  j

 gi ( x)  f ( P(i x)) is a discriminant function

 In general, discriminant functions can be defined

independent of the Bayesian rule. They lead to
suboptimal solutions, yet if chosen appropriately, can be
computationally more tractable.

24
BAYESIAN CLASSIFIER FOR NORMAL
DISTRIBUTIONS

25
26
27
BAYESIAN CLASSIFIER FOR NORMAL
DISTRIBUTIONS

 Multivariate Gaussian pdf

1  1 
p ( x i )  l
exp  ( x   i )   i1 ( x   i ) 
1
 2 
(2 )  i
2 2

 i  E x  l  l matrix in i


 i  E ( x   i )( x   i )  
called covariance matrix

28
 ln() is monotonic. Define:

 g i ( x)  ln( p ( x i ) P(i )) 
ln p ( x  i )  ln P(i )
1 T 1
 g i ( x)   ( x   i )  i ( x   i )  ln P (i )  Ci
2
l 1
Ci  ( ) ln 2  ( ) ln  i
2 2

 2 0 
 Example:  i   
2
0   29
1 1
 g i ( x)   (x  x ) 
2 2
( i1 x1  i 2 x2 )
2 
2 1 2 2

1
 ( i21  i22 )  ln( Pi )  Ci
2 2

That is, g i (x) is quadratic and the surfaces

g i ( x)  g j ( x)  0
quadrics, ellipsoids, parabolas, hyperbolas,
pairs of lines.
For example:

30
 Decision Hyperplanes

1
x  x
T
 Quadratic terms: i

If ALL Σ i  Σ (the same) the quadratic

terms are not of interest. They are not
involved in comparisons. Then, equivalently,
we can write:
g i ( x)  w x  wio
T
i

wi  Σ 1  i
1 Τ 1
wi 0  ln P(i )   i Σ  i
2
Discriminant functions are LINEAR 31
 Let in addition:
• Σ   2 I . Then
1
g i ( x)   i x  wi 0
T

 2

• g ij ( x)  g i ( x)  g j ( x)  0
T
 w ( x  xo )
• w  i   j,

1 P(i )  i   j
• x o  (  i   j )   ln
2

2 P( j )    2
i j

32
 Nondiagonal:    2
T
• gij ( x)  w ( x  x0 )  0
1
• w   (i   j )
1 P (i ) i   j
• x 0  (  i   j )  ln ( )
2 P ( j )    2
i j 1
1
 ( x  1 x)
T 2
x  1

not normal to  i   j
 Decision hyperplane
normal to  1 (  i   j )
33
 Minimum Distance Classifiers

1
 P(i )  equiprobable
M
1
 g i ( x)   ( x   i )T  1 ( x   i )
2

    2 I : Assign x  i :

Euclidean Distance: dE  x   i
smaller

    2 I : Assign x  i :
1
1
Mahalanobis Distance: dm  ((x  i )  (x  i ))
T 2

smaller 34
35
 Example:
Given 1 ,  2 : P (1 )  P ( 2 ) and p ( x 1 )  N (  1 , Σ ),
0  3 1.1 0.3
p ( x  2 )  N (  2 , Σ ),  1    ,  2    ,    
0
  3
   0 . 3 1 . 9 
1.0 
classify t he vector x    using Bayesian classifica tion :
 2 .2 
 0.95  0.15
 Σ 
-1

  0. 15 0 .55 
 Compute Mahalanobis d m from 1 ,  2 : d 2 m,1  1.0, 2.2
1.0  1   2.0 
Σ    2.952, d m , 2   2.0,  0.8  
1 2
  3.672
2.2   0.8

 Classify x  1. Observe that d E ,2  d E ,1

36
ESTIMATION OF UNKNOWN PROBABILITY
DENSITY FUNCTIONS
 Maximum Likelihood

 Let x , x ,...., x known and independent

1 2 N
 Let p( x) known within an unknown vector
parameter  : p ( x)  p ( x; )
 X  x1 , x 2 ,...x N 

 p( X ; )  p( x1 , x 2 ,...x N ; )
N
  p ( x k ; )
k 1

which is known as the Likelihood of  w.r. to X

The method :
37
Ν
 ˆ ML : arg max  p ( x k ; )
 k 1
N
 L( )  ln p ( X ; )   ln p ( x k ; )
k 1

L ( ) N 1 p ( x  )
 ˆ ML :  0
k ;

 ( ) k 1 p ( x k ; )  ( )

38
39
If, indeed, there is a  0 such that
p ( x )  p ( x ;  0 ) , then
lim E [ ML ] 0
N 
2
lim E ˆ ML   0 0
N

Asymptotically unbiased and consistent

40
 Example:
p ( x) : N (  , Σ ) :  unknown, x1 , x 2 ,..., x N p( x k )  p( x k ;  )
N 1 N
L(  )  ln  p ( x k ;  )  C   ( x k   )T Σ 1 ( x k   )
k 1 2 k 1
1 1 1
p( x k ;  )  l 1
exp(  ( x k   ) T
Σ ( x k   ))
2
(2 ) Σ
2 2

 L 
  
 1
 . 
L(  ) N 1 N
  .    Σ ( x k   )  0   ML   x k
1

( )   k 1 N k 1
 . 
 L 
  
 l
 ( A )
T
Remember : if A  A 
T
 2 A 41

 Maximum Aposteriori Probability Estimation
 In ML method, θ was considered as a parameter
 Here we shall look at θ as a random vector
described by a pdf p(θ), assumed to be known
 Given
X  x 1 , x 2 ,..., x N 
Compute the maximum of
p ( X )
 From Bayes theorem
p ( ) p( X  )  p( X ) p( X ) or
p ( ) p( X  )
p ( X )  42
p( X )
 The method:

ˆ MAP  arg max p ( X ) or




ˆ : ( P ( ) p ( X  ))

MAP

If p ( ) is uniform or broad enough ˆ MAP   ML

43
44
 Example:

p( x) : N (, Σ ),  unknown, X  x1,...,x N 

2
1   0
p( )  exp( )
l
2 
2
(2 )  l
2

)  N N 1 1
 MAP : ln( p( x k  ) p( ))  0 or  2 ( x k   )  2 (ˆ   0 )  0 
  k 1 k 1  
 2 N
 0  2  xk  2

ˆ MAP   k 1 
For 2  1, or for N  
 2

1 2 N

1 N
ˆ MAP  ˆ ML   x k
N k 1

45
 Bayesian Inference

 ML, MAP  a single estimate for  .

Here a different root is followed.
Given : X  {x1 ,..., x N }, p( x  ) and p( )
The goal : estimate p( x X )
How??

46
p( x X )   p( x  ) p( X )d 
p( X  ) p( ) p( X  ) p ( )
p( X )  
p( X )  p( X  ) p( )d 
N
p( X  )  
k 1
p( x k  )

A bit more insight vi a an example

 Let p ( x  )  N (  ,  2 )
 p (  )  N (  0 ,  02 )
 It turns out that : p (  X )  N (  N ,  N2 )
N  02 x   2  0  2 02 1 N
N 
N 0  
2 2
,  
2
N
N 0  
2 2
, x
N
x
k 1
k

47
 The above is a sequence of Gaussians as N  

 Maximum Entropy
 Entropy H    p ( x ) ln p ( x)d x

 pˆ ( x ) : maximum H subject to the

available constraint s 48
 Example: x is nonzero in the interval x1  x  x2
and zero otherwise. Compute the ME pdf

• The constraint:
x2


x1
p ( x ) dx  1

• Lagrange Multipliers
x2

H L  H   (  p ( x)dx  1)
x1

• pˆ ( x)  exp(  1)
 1 x1  x  x2

ˆp ( x)   x2  x1
 0 otherwise 49
 Mixture Models
J
 p ( x )   p ( x j ) Pj
The picture can't be display ed.

j 1
M

P
j 1
j  1,  p ( x j ) d x  1
x

 Assume parametric modeling, i.e., p( x j ; )

 The goal is to estimate  and P1 , P2 ,..., Pj

given a set X x1 , x 2 ,..., x N 
 Why not ML? As before?
N
max  P( x k ; , Pi ,..., Pj )
 , Pi ,..., Pj k 1 50
 This is a nonlinear problem due to the missing
label information. This is a typical problem with
an incomplete data set.

 The Expectation-Maximisation (EM) algorithm.

• General formulation
– y the complete data set y  Y  R m
, with p y ( y; ) ,
which are not observed directly.

We observe
x  g( y)  X ob  R , l  m with Px (x; ),
l

a many to one transformation

51
• Let Y ( x)  Y all y ' s  to a specific x
p x ( x; )  p
Y ( x)
y ( y;  ) d y

• What we need is to compute

 ln( p y ( y k ; ))
ˆML : 
k 
0

• But yk ' s are not observed. Here comes the

EM. Maximize the expectation of the loglikelihood
conditioned on the observed samples and the
current iteration estimate of  .

52
 The algorithm:
• E-step: Q( ; (t ))  E[ln(py ( y k ; X ; (t ))]
k

Q ( ; (t ))
• M-step:  (t  1) : 0

 Application to the mixture modeling problem
• Complete data ( x k , jk ), k  1,2,..., N
• Observed data x k , k  1,2,..., N
• p ( x k , jk ; )  p ( x jk ; ) Pjk
k
• Assuming mutual independence
N
L ( )   ln( p ( x k jk ; ) Pjk ) 53
k 1
• Unknown parameters

  [ , P ] , P  [ P1 , P2 ,..., Pj ]T
T T T T

• E-step N N
Q(; (t ))  E[ln(p( xk jk ; )Pjk )]  E[ ]
k 1 k 1
N J


k 1 jk 1
P ( jk xk ;(t )) ln( p( xk jk ; )P jk )

Q Q
• M-step 0  0, jk  1,2,..., J
 Pjk

p( x k j; (t )) Pj J
P( j x k ; (t ))  p( x k ; (t ))   p( x k j; (t )) Pj
P( x k ; (t )) j 1
54
 Nonparametric Estimation

 kN k N in h
P
N N total h h
x̂  x̂ x̂ 
1 kN h 2 2
 pˆ ( x)  pˆ ( xˆ )  , x - xˆ 
h N 2

 If p(x) continuous, pˆ (x)  p(x) as N , if

kN
hN 0, kN , 0 55
N
 Parzen Windows
 Divide the multidimensional space in hypercubes

56
 Define
1 1 

x ij 
 (xi)   2 
0
 otherwise 
• That is, it is 1 inside a unit side hypercube centered
at 0
1 1 N
xi  x
• p( x)  l (
ˆ
h N

i 1
(
h
))

1 1
• * * number of points inside
volume N
an h - side hypercube centered at x

• The problem: p( x) continuous

 (.) discontinuous
• Parzen windows-kernels-potential functions
 ( x) is smooth
 ( x)  0,   ( x)d x  1 57
x
 Mean value
1 1 N
xi  x 1 x' x
E[ p( x)]  l (
ˆ
h N

i 1
E[ (
h
)])   l  (
h h
) p ( x' )d x'
x'

1
• h  0, l  
h
x' x
• h  0 the width of  ( )0
h
1 x ' x
•  l ( )d x  1
h h
1 x
• h  0 l  ( )   ( x)
h h
E[ pˆ ( x)]    ( x' x) p ( x' )d x'  p( x)
x'

Hence unbiased in the limit 58

 Variance
• The smaller the h the higher the variance

h=0.1, N=1000 h=0.8, N=1000

59
h=0.1, N=10000

The higher the N the better the accuracy

60
 If
• h0
• N 
• hΝ  

asymptotically unbiased

 The method
• Remember:
p ( x 1 ) P ( 2 )  21   22
l12  (  ) 
p(x 2 ) P ( 1 ) 12  11
1 N1
xi  x
N 1h l
 (
h
)
• i 1
(  ) 
1 N2
xi  x
N 2h l

i 1
(
h
)
61
 CURSE OF DIMENSIONALITY
 In all the methods, so far, we saw that the highest
the number of points, N, the better the resulting
estimate.

 If in the one-dimensional space an interval, filled

with N points, is adequately (for good estimation), in
the two-dimensional space the corresponding square
will require N2 and in the ℓ-dimensional space the ℓ-
dimensional cube will require Nℓ points.

 The exponential increase in the number of necessary

points in known as the curse of dimensionality. This
is a major problem one is confronted with in high
dimensional spaces.

62
 NAIVE – BAYES CLASSIFIER

 Let x   and the goal is to estimate p x |  i 

i = 1, 2, …, M. For a “good” estimate of the pdf

one would need, say, Nℓ points.

 Assume x1, x2 ,…, xℓ mutually independent. Then:

p x | i    p x j | i 
l

j 1

 In this case, one would require, roughly, N points

for each pdf. Thus, a number of points of the
order N·ℓ would suffice.

 It turns out that the Naïve – Bayes classifier

works reasonably well even in cases that violate
the independence assumption. 63
 K Nearest Neighbor Density Estimation

 In Parzen:
• The volume is constant
• The number of points in the volume is varying

 Now:
• Keep the number of points kN  k
constant

• Leave the volume to be varying

k
ˆ ( x) 
•p
NV ( x)

64
•

k
N1V1 N 2V2
 ()
k N1V1
N 2V2
65
 The Nearest Neighbor Rule
 Choose k out of the N training vectors, identify the k
nearest ones to x

 Out of these k identify ki that belong to class ωi

 Assign x   i : ki  k j i  j

 The simplest version

k=1 !!!

 For large N this is not bad. It can be shown that:

if PB is the optimal Bayesian error probability, then:
M
PB  PNN  PB (2  PB )  2 PB
M 1
66
2 PNN
 PB  PkNN  PB 
k

 k   , PkNN  PB

 For small PB:

PNN  2 PB
P3 NN  PB  3( PB ) 2

67
 Voronoi tesselation

Ri  x : d ( x , x i )  d ( x , x j ) i  j 
68
BAYESIAN NETWORKS
 Bayes Probability Chain Rule
p( x1, x2 ,..., xl )  p( xl | xl1,..., x1 )  p( xl1 | xl2 ,..., x1 )  ...
... p( x2 | x1 )  p( x1 )

 Assume now that the conditional dependence for

each xi is limited to a subset of the features
appearing in each of the product terms. That is:
l
p( x1, x2 ,..., xl )  p( x1 )   p( xi | Ai )
i 2
where
Ai  xi1, xi2 ,..., x1

69
 For example, if ℓ=6, then we could assume:
p( x6 | x5 ,..., x1 )  p( x6 | x5 , x4 )
Then:
A6  x5 , x4   x5 ,..., x1

 The above is a generalization of the Naïve – Bayes.

For the Naïve – Bayes the assumption is:
Ai = Ø, for i=1, 2, …, ℓ

70
 A graphical way to portray conditional dependencies
is given below
 According to this figure we
have that:
• x6 is conditionally dependent on
x4, x5.
• x5 on x4
• x4 on x1, x2
• x3 on x2
• x1, x2 are conditionally
independent on other variables.

 For this case:

p( x1, x2 ,...,x6 )  p( x6 | x5 , x4 )  p( x5 | x4 )  p( x3 | x2 )  p( x2 )  p( x1)
71
 Bayesian Networks

 Definition: A Bayesian Network is a directed acyclic

graph (DAG) where the nodes correspond to random
variables. Each node is associated with a set of
conditional probabilities (densities), p(xi|Ai), where xi
is the variable associated with the node and Ai is the
set of its parents in the graph.

 A Bayesian Network is specified by:

• The marginal probabilities of its root nodes.
• The conditional probabilities of the non-root nodes,
given their parents, for ALL possible combinations.

72
 The figure below is an example of a Bayesian
Network corresponding to a paradigm from the
medical applications field.
 This Bayesian network
models conditional
dependencies for an
example concerning
smokers (S),
tendencies to develop
cancer (C) and heart
disease (H), together
with variables
corresponding to heart
(H1, H2) and cancer
(C1, C2) medical tests.

73
 Once a DAG has been constructed, the joint
probability can be obtained by multiplying the
marginal (root nodes) and the conditional (non-root
nodes) probabilities.

 Training: Once a topology is given, probabilities are

estimated via the training data set. There are also
methods that learn the topology.

 Probability Inference: This is the most common task

that Bayesian networks help us to solve efficiently.
Given the values of some of the variables in the
graph, known as evidence, the goal is to compute
the conditional probabilities for some of the other
variables, given the evidence.

74
 Example: Consider the Bayesian network of the
figure:

a) If x is measured to be x=1 (x1), compute

P(w=0|x=1) [P(w0|x1)].

b) If w is measured to be w=1 (w1) compute

P(x=0|w=1) [ P(x0|w1)].

75
 For a), a set of calculations are required that
propagate from node x to node w. It turns out that
P(w0|x1) = 0.63.

 For b), the propagation is reversed in direction. It

turns out that P(x0|w1) = 0.4.

 In general, the required inference information is

computed via a combined process of “message
passing” among the nodes of the DAG.

Complexity:
 For singly connected graphs, message passing
algorithms amount to a complexity linear in the
number of nodes. 76

Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
64 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
44 pages
Pattern Classification
No ratings yet
Pattern Classification
39 pages
Lecture 2 3
No ratings yet
Lecture 2 3
72 pages
Unit - V Pattern Recognition: Dr.K.Sampath Kumar Scse/Gu
No ratings yet
Unit - V Pattern Recognition: Dr.K.Sampath Kumar Scse/Gu
30 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
Pattern Classification
No ratings yet
Pattern Classification
39 pages
Pattern Classification
No ratings yet
Pattern Classification
141 pages
DHSCH 2 Part 3
No ratings yet
DHSCH 2 Part 3
22 pages
03 Classification Methods
No ratings yet
03 Classification Methods
37 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
Statistical Perspective
No ratings yet
Statistical Perspective
85 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
8 ML
No ratings yet
8 ML
22 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
Supervised Unsupervised
No ratings yet
Supervised Unsupervised
39 pages
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
No ratings yet
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
31 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Bayesian Decision Theory: Intro To
No ratings yet
Bayesian Decision Theory: Intro To
56 pages
Bayesian Theory
No ratings yet
Bayesian Theory
66 pages
Bayesian
No ratings yet
Bayesian
21 pages
PR January20 03 PDF
No ratings yet
PR January20 03 PDF
74 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
17 pages
Chapter 2. Classifiers Based On Bayes Decision Theory
No ratings yet
Chapter 2. Classifiers Based On Bayes Decision Theory
1 page
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Machine Learning 04 - Bayes
No ratings yet
Machine Learning 04 - Bayes
35 pages
Lecturer4 - Bayesian Decision Theory
No ratings yet
Lecturer4 - Bayesian Decision Theory
40 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
17 pages
Kuliah 3 Teori Keputusan Bayes Bag 2
No ratings yet
Kuliah 3 Teori Keputusan Bayes Bag 2
30 pages
Bayesian Decision Theory: Prof. Richard Zanibbi
No ratings yet
Bayesian Decision Theory: Prof. Richard Zanibbi
47 pages
Bayes&Voice Recognition
No ratings yet
Bayes&Voice Recognition
76 pages
Problem Sheet 1 Answers
No ratings yet
Problem Sheet 1 Answers
4 pages
DHSCH 2 Part 2
No ratings yet
DHSCH 2 Part 2
16 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
12 pages
Lec 1
No ratings yet
Lec 1
42 pages
Bayes
No ratings yet
Bayes
10 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
PR January20 04 PDF
No ratings yet
PR January20 04 PDF
40 pages
Pattern Classification PDF
No ratings yet
Pattern Classification PDF
39 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
SGN-2506 Introduction To Pattern Recognition Handout
No ratings yet
SGN-2506 Introduction To Pattern Recognition Handout
82 pages
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
100% (1)
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
35 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
16 pages
Uoc Luong Phi Tham So
No ratings yet
Uoc Luong Phi Tham So
84 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
80 pages
Pattern Classification
No ratings yet
Pattern Classification
39 pages
CHY Brochure A4 72pg V11
No ratings yet
CHY Brochure A4 72pg V11
72 pages
IPD Products For Caterpillar Spark Ignited Engines: The Standard For Quality, Innovation, Service and Support Since 1955
No ratings yet
IPD Products For Caterpillar Spark Ignited Engines: The Standard For Quality, Innovation, Service and Support Since 1955
52 pages
HD07 - Amadeus Reservation and Ticketing Help Desk - Air - Help Desk Module - Jan2018 - 3903939 - en - US
No ratings yet
HD07 - Amadeus Reservation and Ticketing Help Desk - Air - Help Desk Module - Jan2018 - 3903939 - en - US
66 pages
Islamic Political System (Basic Concept) : Madiha Ashraf
100% (1)
Islamic Political System (Basic Concept) : Madiha Ashraf
13 pages
Precast Concrete Bearing Wall Panel Design
100% (1)
Precast Concrete Bearing Wall Panel Design
22 pages
Computer Science Worksheet
No ratings yet
Computer Science Worksheet
6 pages
Transfluid Clutch in 1412TP
No ratings yet
Transfluid Clutch in 1412TP
4 pages
2025 Specimen Paper 5 Mark Scheme
No ratings yet
2025 Specimen Paper 5 Mark Scheme
10 pages
Automation Simulation:: Your Gateway Into Smart Manufacturing
No ratings yet
Automation Simulation:: Your Gateway Into Smart Manufacturing
31 pages
All of The Documentation - Electron
No ratings yet
All of The Documentation - Electron
315 pages
Surveys (Tunneling)
No ratings yet
Surveys (Tunneling)
66 pages
Accounting Information Systems 14th Edition (Ebook PDF) Download
100% (1)
Accounting Information Systems 14th Edition (Ebook PDF) Download
58 pages
Grade 8 and 9 Workbook
No ratings yet
Grade 8 and 9 Workbook
155 pages
What Is Failure Mode Effects Analysis
No ratings yet
What Is Failure Mode Effects Analysis
6 pages
WBCS Preliminary Exam Solved Question Paper 2015 (English Version) - BengalStudents
No ratings yet
WBCS Preliminary Exam Solved Question Paper 2015 (English Version) - BengalStudents
13 pages
Upsc Cms Guru Answerkey2022p1
No ratings yet
Upsc Cms Guru Answerkey2022p1
45 pages
Part 1 (A) Company Profile: A Study of Effect of Branding On Consumer Buying Behaviour With Respect To Fashion Industry
No ratings yet
Part 1 (A) Company Profile: A Study of Effect of Branding On Consumer Buying Behaviour With Respect To Fashion Industry
43 pages
ICTU SurveyQuestionnaire SB
No ratings yet
ICTU SurveyQuestionnaire SB
2 pages
(FREE PDF Sample) Mostly Codeless Game Development: New School Game Engines Robert Ciesla Ebooks
100% (2)
(FREE PDF Sample) Mostly Codeless Game Development: New School Game Engines Robert Ciesla Ebooks
55 pages
VBQ-XII - English Core - 2
No ratings yet
VBQ-XII - English Core - 2
25 pages
2.multiple Currencies in Purchase Order Release Strategy
No ratings yet
2.multiple Currencies in Purchase Order Release Strategy
4 pages
Automatic Power Switching Mains, Solar, Inverter
No ratings yet
Automatic Power Switching Mains, Solar, Inverter
14 pages
Enterprise Structure
No ratings yet
Enterprise Structure
4 pages
Lecture Note On PCA1
No ratings yet
Lecture Note On PCA1
26 pages
ACN - Lect 04
No ratings yet
ACN - Lect 04
16 pages
Manual de Instalación XLED
No ratings yet
Manual de Instalación XLED
92 pages
Bar Velocities Capable of Optimising The Muscle Power in Strength-Power Exercises
No ratings yet
Bar Velocities Capable of Optimising The Muscle Power in Strength-Power Exercises
9 pages
GAD Activity Design Template
No ratings yet
GAD Activity Design Template
2 pages
Lecture Notes For Chapter 4 Artificial Neural Networks: Data Mining
No ratings yet
Lecture Notes For Chapter 4 Artificial Neural Networks: Data Mining
25 pages
CH8568DOCSIS 3.1 Wireless Voice Gateway
No ratings yet
CH8568DOCSIS 3.1 Wireless Voice Gateway
3 pages
Advance Computer Networks: Spring 2020-21 Lect. #09
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #09
17 pages
ACN - Lect 02
No ratings yet
ACN - Lect 02
15 pages
Advance Computer Networks: Spring 2020-21 Lect. #06
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #06
15 pages
Advance Computer Networks: Spring 2020-21 Lect. #07
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #07
11 pages
Advance Computer Networks: Spring 2020-21 Lect. #08
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #08
10 pages
Advance Computer Networks: Spring 2020-21 Lect. #01
No ratings yet
Advance Computer Networks: Spring 2020-21 Lect. #01
8 pages
(STUDI KASUS: Yayasan Sosial Dana Priangan) : Perancangan Sistem Informasi Museum Budaya Tionghoa Bandung Berbasis Web
No ratings yet
(STUDI KASUS: Yayasan Sosial Dana Priangan) : Perancangan Sistem Informasi Museum Budaya Tionghoa Bandung Berbasis Web
6 pages
Ansys Beam Analysis and Cross Sections
No ratings yet
Ansys Beam Analysis and Cross Sections
17 pages
Research Paper Topics:: World War II
No ratings yet
Research Paper Topics:: World War II
4 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)

Sergios Theodoridis Konstantinos Koutroumbas

Uploaded by

Sergios Theodoridis Konstantinos Koutroumbas

Uploaded by

Sergios Theodoridis

 The task: Assign unknown objects – patterns – into the correct

Feature vectors: A number of features

Feature vectors are treated as random vectors.

 Classification system overview

 Statistical nature of feature vectors

 Assign the pattern represented by feature vector x

This is also known as the likelihood of

 Equivalently: classify x according to the rule

 For equiprobable classes the test becomes

 Bayesian classifier is OPTIMAL with respect to

 Such a choice also minimizes the classification error

 Minimizing the average risk

• 12 penalty term for deciding class  2

 Risk with respect to 1

r1  11  p(x1)d x  12  p(x1)d x

  Probabilities of wrong decisions,

is the surface separating the regions. On one side is

x i if : f (P(i x))  f (P(j x)) i  j

 gi ( x)  f ( P(i x)) is a discriminant function

 In general, discriminant functions can be defined

 Multivariate Gaussian pdf

That is, g i (x) is quadratic and the surfaces

If ALL Σ i  Σ (the same) the quadratic

 Classify x  1. Observe that d E ,2  d E ,1

 Let x , x ,...., x known and independent

which is known as the Likelihood of  w.r. to X

Asymptotically unbiased and consistent

ˆ MAP  arg max p ( X ) or

If p ( ) is uniform or broad enough ˆ MAP   ML

p( x) : N (, Σ ),  unknown, X  x1,...,x N 

 ML, MAP  a single estimate for  .

A bit more insight vi a an example

 pˆ ( x ) : maximum H subject to the

 Assume parametric modeling, i.e., p( x j ; )

 The goal is to estimate  and P1 , P2 ,..., Pj

 The Expectation-Maximisation (EM) algorithm.

a many to one transformation

• What we need is to compute

• But yk ' s are not observed. Here comes the

 If p(x) continuous, pˆ (x)  p(x) as N , if

• The problem: p( x) continuous

Hence unbiased in the limit 58

h=0.1, N=1000 h=0.8, N=1000

The higher the N the better the accuracy

 If in the one-dimensional space an interval, filled

 The exponential increase in the number of necessary

 Let x   and the goal is to estimate p x |  i 

i = 1, 2, …, M. For a “good” estimate of the pdf

 Assume x1, x2 ,…, xℓ mutually independent. Then:

 In this case, one would require, roughly, N points

 It turns out that the Naïve – Bayes classifier

• Leave the volume to be varying

 Out of these k identify ki that belong to class ωi

 The simplest version

 For large N this is not bad. It can be shown that:

 For small PB:

 Assume now that the conditional dependence for

 The above is a generalization of the Naïve – Bayes.

 For this case:

 Definition: A Bayesian Network is a directed acyclic

 A Bayesian Network is specified by:

 Training: Once a topology is given, probabilities are

 Probability Inference: This is the most common task

a) If x is measured to be x=1 (x1), compute

b) If w is measured to be w=1 (w1) compute

 For b), the propagation is reversed in direction. It

 In general, the required inference information is

You might also like