0% found this document useful (0 votes)

25 views12 pages

Introduction To Pattern Recognition

1) Pattern recognition is used in applications like machine vision, character recognition, medical diagnosis, and biometrics. The task involves assigning patterns to the correct class. 2) Patterns are represented by feature vectors which are treated as random vectors. A classifier assigns patterns to classes based on feature vector values. 3) Bayes classifiers assign patterns to the class with the maximum a posteriori probability based on feature likelihoods and class priors. This Bayesian approach is optimal for minimizing classification error.

Uploaded by

CONSPIRANCY THEORIES

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views12 pages

Introduction To Pattern Recognition

Uploaded by

CONSPIRANCY THEORIES

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

❖10/06/2022

PATTERN RECOGNITION
❖ Typical application areas
➢ Machine vision
➢ Character recognition (OCR)
➢ Computer aided diagnosis
➢ Speech recognition
➢ Face recognition
➢ Biometrics
➢ Image Data Base retrieval
➢ Data mining
Sergios Theodoridis ➢ Bionformatics
Konstantinos Koutroumbas
❖ The task: Assign unknown objects – patterns – into the correct
class. This is known as classification.

1 2
Version 2

❖1 ❖2

An example:

❖ Features: These are measurable quantities obtained from

the patterns, and the classification task is based on their
respective values.

❖Feature vectors: A number of features

x1 ,..., xl ,
constitute the feature vector
x = x1 ,..., xl   R l
T

Feature vectors are treated as random vectors.

3 4

❖3 ❖4

❖ The classifier consists of a set of functions, whose values,

computed at x , determine the class to which the
corresponding pattern belongs
❖ Supervised – unsupervised pattern recognition:
❖ Classification system overview The two major directions
Patterns ➢ Supervised: Patterns whose class is known a-priori
sensor are used for training.
➢ Unsupervised: The number of classes is (in general)
feature unknown and no training patterns are available.
generation

feature
selection

classifier
design

system
evaluation
5 6

❖5 ❖6

❖1
❖10/06/2022

CLASSIFIERS BASED ON BAYES DECISION

THEORY

❖ Statistical nature of feature vectors ❖ Computation of a-posteriori probabilities

x = x1 , x2 ,..., xl  ➢ Assume known
T

• a-priori probabilities
P (1 ), P (2 )..., P (M )
❖ Assign the pattern represented by feature vector x
to the most probable of the available classes • p ( x i ), i = 1,2...M
1 , 2 ,..., M
This is also known as the likelihood of
x w.r. to i .
That is x →  i : P ( i x )
maximum

7 8

❖7 ❖8

❖ The Bayes classification rule (for two classes M=2)

➢ Given x classify it according to the rule

The Bayes rule (Μ=2)

➢
If P(1 x )  P(2 x ) x → 1
p ( x) P(i x) = p ( x i ) P(i ) 
If P(2 x )  P(1 x ) x → 2
p ( x i ) P(i )
P(i x) =
p( x)
➢ Equivalently: classify x according to the rule
where 2
p ( x) =  p ( x i ) P(i )
p( x 1 ) P(1 )(  ) p( x 2 ) P(2 )
i =1

➢ For equiprobable classes the test becomes

p ( x 1 )( ) P ( x  2 )
9 10

❖9 ❖10

❖ Equivalently in words: Divide space in two regions

If x  R1  x in 1
If x  R2  x in 2

❖ Probability of error
➢ Total shaded area
x0 +
➢P
e =  p( x 
−
2 )dx +  p( x 1 )dx
x0

R1 (→ 1 ) and R2 (→ 2 ) ❖ Bayesian classifier is OPTIMAL with respect to

minimising the classification error probability!!!!
11 12

❖11 ❖12

❖2
❖10/06/2022

❖ The Bayes classification rule for many (M>2) classes:

➢ Given x classify it to  i if:

P(i x)  P( j x) j  i

➢ Such a choice also minimizes the classification error

probability

❖ Minimizing the average risk

➢ For each wrong decision, a penalty term is assigned since
some decisions are more sensitive than others

➢ Indeed: Moving the threshold the total shaded

area INCREASES by the extra “grey” area.
13 14

❖13 ❖14

➢ For M=2
➢ Risk with respect to 2
• Define the loss matrix
r2 = 21  p( x 2 )d x + 22  p( x 2 )d x
11 12
L=( ) R1 R2
21 22

12 penalty term for deciding class 2 ,


•
➢ Probabilities of wrong decisions,
although the pattern belongs to 1 , etc.
weighted by the penalty terms

➢ Risk with respect to 1 ➢ Average risk

r1 = 11  p( x 1 )d x + 12  p( x 1 )d x
R1 R2
r = r1P(1 ) + r2 P(2 )
15 16

❖15 ❖16

❖ Choose R1 and R2 so that r is minimized

❖ If 1
P(1 ) = P(2 ) = and 11 = 22 = 0
❖ Then assign x to  i if 2
 1  11 p( x 1 ) P(1 ) + 21 p( x 2 ) P(2 ) 21
x → 1 if P( x 1 )  P( x 2 )
 2  12 p( x 1 ) P(1 ) + 22 p( x 2 ) P(2 ) 12
12
❖ Equivalently: x → 2 if P( x 2 )  P( x 1 )
assign x in 1 ( 2 ) if 21
p( x 1 ) P(2 ) 21 − 22 if 21 = 12  Minimum classification
 12   ()
p ( x 2 ) P(1 ) 12 − 11 error probability

 12 : likelihood ratio

17 18

❖17 ❖18

❖3
❖10/06/2022

❖ An example: ➢ Then the threshold value is:

1
− p ( x 1 ) = exp(− x )
2 x0 for minimum Pe :

x0 : exp(− x 2 ) = exp(−( x − 1) 2 ) 
1
− p( x 2 ) = exp(−( x − 1) )
2
1
 x0 =
1 2
− P(1 ) = P(2 ) =
2 ➢ Threshold x̂0 for minimum r
 0 0.5 
− L =   xˆ0 : exp(− x 2 ) = 2 exp(−( x − 1) 2 ) 
1.0 0 
(1 − n2) 1
xˆ0 = 
2 2
19 20

❖19 ❖20

1 DISCRIMINANT FUNCTIONS
Thus x̂ 0 moves to the left of = x0
(WHY?) 2 DECISION SURFACES
❖ If Ri , R j are contiguous: g ( x)  P(i x) − P( j x) = 0
Ri : P(i x)  P( j x)
+
- g ( x) = 0
R j : P( j x)  P(i x)

is the surface separating the regions. On one side is

positive (+), on the other is negative (-). It is known
as Decision Surface

21 22

❖21 ❖22

BAYESIAN CLASSIFIER FOR NORMAL

DISTRIBUTIONS

❖ Multivariate Gaussian pdf

❖ If f(.) monotonic, the rule remains the same if we use:
1  1 
p ( x i ) = exp − ( x −  i )   i−1 ( x −  i ) 
x → i if : f ( P(i x))  f ( P( j x)) i  j 
(2 )  i
2
1
2
 2 

❖ g i ( x)  f ( P(i x)) is a discriminant function

 i = Ex     matrix in i
❖ In general, discriminant functions can be defined
independent of the Bayesian rule. They lead to
suboptimal solutions, yet if chosen appropriately, can be

 i = E ( x −  i )( x −  i )  
computationally more tractable. called covariance matrix

23 24

❖23 ❖24

❖4
❖10/06/2022

1 1
➢ g i ( x) = − ( x12 + x22 ) + 2 ( i1 x1 + i 2 x2 )
2 2 
1
− ( i21 + i22 ) + ln( Pi ) + Ci
❖ ln() is monotonic. Define: 2 2

➢ g i ( x) = ln( p( x i ) P(i )) = That is, gi (x) is quadratic and the surfaces

ln p( x  i ) + ln P(i ) g i ( x) − g j ( x) = 0
1 quadrics, ellipsoids, parabolas, hyperbolas,
➢ g i ( x) = − ( x −  i )  i ( x −  i ) + ln P (i ) + Ci
T −1
pairs of lines.
2
For example:
 1
Ci = −( ) ln 2 − ( ) ln  i
2 2

 2 0 
➢ Example:  i =  
2
0   25 26

❖25 ❖26

❖ Decision Hyperplanes ➢ Let in addition:

• Σ =  2 I . Then
x  i−1 x
T
➢ Quadratic terms: g i ( x) =
1
 Ti x + wi 0
2
If ALL Σi = Σ (the same) the quadratic • g ij ( x) = g i ( x) − g j ( x) = 0
terms are not of interest. They are not
= w ( x − xo )
T
involved in comparisons. Then, equivalently,
we can write: • w = i −  j,
g i ( x) = w x + wio P(i )  i −  j
T
1
(  +  j ) −  2 ln
i
• xo =
wi = Σ  i −1 2 i P( j )  −  2
i j

1 Τ
wi 0 = ln P(i ) −  i Σ −1  i
2
Discriminant functions are LINEAR 27 28

❖27 ❖28

➢ Nondiagonal:    2 ❖ Minimum Distance Classifiers

g ij ( x) = w ( x − x 0 ) = 0
T 1
• ➢ P(i ) = equiprobable
M
1
• w =  −1 (  i −  j ) ➢ g i ( x) = − ( x −  i )T  −1 ( x −  i )
2
1 P(i ) i −  j
• x0 = (  i +  j ) − n ( ) ➢  =  2 I : Assign x → i :
2 P( j )  −  2
j −1
dE  x −  i
i
1 Euclidean Distance:
 ( x  −1 x) 2 smaller
T
x  −1

not normal to  i −  j
➢    2 I : Assign x → i :
1
➢ Decision hyperplane
normal to  −1 (  i −  j ) Mahalanobis Distance: d m = (( x −  i )T  −1 ( x −  i )) 2
29
smaller 30

❖29 ❖30

❖5
❖10/06/2022

❖ Example:
Given 1 , 2 : P (1 ) = P (2 ) and p ( x 1 ) = N (  1 , Σ ),
0  3 1.1 0.3
p ( x 2 ) = N (  2 , Σ ),  1 =  ,  2 =  ,  =  
0  3 0.3 1.9 
1.0 
classify the vector x =   using Bayesian classification :
2.2
 0.95 − 0.15
• Σ -1 =  
− 0.15 0.55 
• Compute Mahalanobis d m from 1 ,  2 : d 2 m ,1 = 1.0, 2.2
1.0  − 2.0
Σ −1   = 2.952, d 2 m, 2 = − 2.0, − 0.8  −1   = 3.672
2.2  − 0.8

• Classify x → 1. Observe that d E ,2  d E ,1

31 32

❖31 ❖32

ESTIMATION OF UNKNOWN PROBABILITY

DENSITY FUNCTIONS
❖ Maximum Likelihood Ν
➢ ˆ ML : arg max  p ( x k ; )
 k =1
➢ Let x , x ,...., x known and independent
1 2 N N
➢ Let p( x) known within an unknown vector ➢ L( )  ln p ( X ; ) =  ln p ( x k ; )
k =1
parameter  : p( x)  p ( x; )
X = x1 , x 2 ,...x N  L( ) N 1 p( x k ; )
➢ ➢ ˆ ML : = =0
➢ p( X ; )  p( x1 , x 2 ,...x N ; )  ( ) k =1 p ( x k ; )  ( )
N
=  p ( x k ; )
k =1

which is known as the Likelihood of  w.r. to X

The method :
33 34

❖33 ❖34

If, indeed, there is a  0 such that

p ( x) = p ( x; 0 ), then
lim E[ ML ] =  0
N →
2
lim E ˆ ML −  0 =0
N →

Asymptotically unbiased and consistent

35 36

❖35 ❖36

❖6
❖10/06/2022

❖ Example:
❖ Maximum Aposteriori Probability Estimation
p ( x) : N (  , Σ ) :  unknown, x1 , x 2 ,..., x N p ( x k )  p ( x k ;  )
➢ In ML method, θ was considered as a parameter
N 1 N
L(  ) = ln  p ( x k ;  ) = C −  ( x k −  )T Σ −1 ( x k −  ) ➢ Here we shall look at θ as a random vector
k =1 2 k =1 described by a pdf p(θ), assumed to be known
1 1
p( x k ;  ) = l
exp(− ( x k −  )T Σ −1 ( x k −  )) ➢ Given
X = x1 , x 2 ,..., x N 
1
2
(2 ) 2 Σ 2

 L 
   Compute the maximum of
 1
L(  ) 
. 
N 1 N p ( X )
  .  =  Σ −1 ( x k −  ) = 0   ML =  x k
( )   k =1 N k =1
 .  ➢ From Bayes theorem
 L 
   p ( ) p ( X  ) = p ( X ) p( X ) or
 l
 ( A )
T p ( ) p ( X  )
Remember : if A = AT  = 2 A 37
p ( X ) = 38
 p( X )

❖37 ❖38

➢ The method:

ˆ MAP = arg max p( X ) or




ˆ MAP : ( P( ) p( X  ))

If p( ) is uniform or broad enough ˆ MAP   ML

39 40

❖39 ❖40

❖ Example:
❖ Bayesian Inference
p( x) : N (  , Σ ),  unknown, X = x1,...,x N 

 − 0
2 ➢ ML, MAP  a single estimate for  .
1
p(  ) = exp(− ) Here a different root is followed.
l
2 2
(2 )  l2 
Given : X = {x1 ,..., x N }, p( x  ) and p( )
  N N 1 1
 MAP : ln(  p( x k  ) p(  )) = 0 or  2 ( x k −  ) − 2 ( ˆ −  0 ) = 0 
  k =1 k =1   The goal : estimate p( x X )
 2 N

How??
0 +  xk 2
ˆ MAP =  2 k =1
For 2  1, or for N → 
 2 
1+ 2 N

1 N
ˆ MAP  ˆ ML =  x k
N k =1

41 42

❖41 ❖42

❖7
❖10/06/2022

p( x X ) =  p ( x  ) p( X )d  ➢ The above is a sequence of Gaussians as N → 

p( X  ) p ( ) p ( X  ) p ( )
p ( X ) = =
p( X )  ( X  ) p( )d 
p
N
p( X  ) = 
k =1
p( x k  )

A bit more insight vi a an example

• Let p( x  ) → N (  ,  2 )
• p(  ) → N (  0 ,  02 ) ❖ Maximum Entropy
• It turns out that : p(  X ) → N (  N ,  N2 ) ➢ Entropy H = −  p( x) ln p( x)d x
N 02 x +  2  0  2 02 1 N
N =
N 02 +  2
,  N2 =
N 02 +  2
, x=
N
x
k =1
k ➢ pˆ ( x) : maximum H subject to the
43 available constraints 44

❖43 ❖44

➢ Example: x is nonzero in the interval x1  x  x2 ❖ Mixture Models

and zero otherwise. Compute the ME pdf J
➢ p( x) =  p( x j ) Pj
• The constraint: j =1
x2 M

 p( x)dx = 1 P
j =1
j = 1,  p( x j )d x = 1
x
x1
• Lagrange Multipliers
➢ Assume parametric modeling, i.e., p( x j ; )
x2

H L = H +  (  p( x)dx − 1)
➢ The goal is to estimate  and P1 , P2 ,..., Pj
x1 , x2 ,..., x N 
x1

•
given a set X=
pˆ ( x) = exp( − 1)
 1 x1  x  x2 ➢ Why not ML? As before?

pˆ ( x) =  x2 − x1 N

 0 otherwise
max  P( x k ; , Pi ,..., Pj )
45  , Pi ,...,Pj k =1 46

❖45 ❖46

➢ This is a nonlinear problem due to the missing • Let Y ( x)  Y all y ' s → to a specific x
label information. This is a typical problem with
an incomplete data set. p x ( x; ) = p
Y ( x)
y ( y;  ) d y

➢ The Expectation-Maximisation (EM) algorithm. • What we need is to compute

 ln( p y ( y k ; ))
• General formulation
y the complete data set y  Y  R m , with p y ( y; ) ,
ˆML :  
=0
– k

which are not observed directly.

• But y k ' s are not observed. Here comes the
We observe EM. Maximize the expectation of the loglikelihood
conditioned on the observed samples and the
x = g ( y )  X ob  R l , l  m with Px ( x; ), current iteration estimate of  .

a many to one transformation

47 48

❖47 ❖48

❖8
❖10/06/2022

➢ The algorithm: • Unknown parameters

• E-step: Q( ; (t )) = E[ ln( p y ( y k ; X ; (t ))]
 = [ , P ]T , P = [ P1 , P2 ,..., Pj ]T
T T T
k

Q ( ; (t )) • E-step
• M-step:  (t + 1) : =0 N
Q(; (t )) = E[ ln( p( x k jk ; ) Pjk )] = E[
N
]=
 k =1 k =1
➢ Application to the mixture modeling problem N J

• Complete data ( x k , jk ), k = 1,2,..., N 

k =1 jk =1
P ( jk x k ;(t )) ln ( p( x k jk ; ) P jk )

• Observed data x k , k = 1,2,..., N • M-step

Q
=0
Q
= 0, jk = 1,2,..., J
 Pjk
• p ( x k , jk ; ) = p ( x jk ; ) Pjk
k
• Assuming mutual independence p( x k j; (t )) Pj J
N P( j x k ; (t )) = p( x k ; (t )) =  p( x k j; (t )) Pj
L( ) =  ln( p( x k jk ; ) Pjk ) 49
P( x k ; (t )) j =1
50
k =1

❖49 ❖50

❖ Nonparametric Estimation
❖ Parzen Windows
➢ Divide the multidimensional space in hypercubes

➢ kN k N in h
P
N N total h h
x̂ − x̂ x̂ +
1 kN h 2 2
➢ pˆ ( x)  pˆ ( xˆ ) = , x - xˆ 
h N 2

➢ If p( x) continuous , pˆ ( x) → p( x) as N → , if
kN
hN → 0, k N → , →0 51 52
N

❖51 ❖52

➢ Define ➢ Mean value


1 xij 
1 

 ( xi ) =  1 1 N
xi − x 1 x'− x
 0
2  E[ pˆ ( x)] = (  E[ ( )]) =  l  ( ) p ( x' )d x'
 otherwise
 hl N i =1 h x'
h h
• That is, it is 1 inside a unit side hypercube centered
at 0
1 1 N
xi − x • h → 0,
1
→
• pˆ ( x) = (
hl N
 (
i =1 h
))
hl
x'− x
•
1 1
* * number of points inside • h → 0 the width of  ( )→0
volume N h
1 x'− x
•  l ( )d x = 1
an h - side hypercube centered at x
h h
• The problem: p( x) continuous 1 x
• h → 0 l  ( ) →  ( x)
 (.) discontinuous h h
• Parzen windows-kernels-potential functions E[ pˆ ( x)] =   ( x'− x) p( x' )d x' = p( x)
 ( x) is smooth x'

 ( x)  0,   ( x)d x = 1 53 Hence unbiased in the limit 54

❖53 ❖54

❖9
❖10/06/2022

➢ Variance h=0.1, N=10000

• The smaller the h the higher the variance

h=0.1, N=1000 h=0.8, N=1000

➢The higher the N the better the accuracy

55 56

❖55 ❖56

➢ If
• h→0 ❖ CURSE OF DIMENSIONALITY
• N → ➢ In all the methods, so far, we saw that the highest
• hΝ →  the number of points, N, the better the resulting
estimate.
asymptotically unbiased
➢ If in the one-dimensional space an interval, filled
➢ The method with N points, is adequately (for good estimation), in
the two-dimensional space the corresponding square
• Remember: will require N2 and in the ℓ-dimensional space the ℓ-
p( x 1 ) P(2 ) 21 − 22 dimensional cube will require Nℓ points.
l12  () 
p ( x 2 ) P(1 ) 12 − 11
➢ The exponential increase in the number of necessary
1 N1
x −x points in known as the curse of dimensionality. This
N1h l
 ( i
h
) is a major problem one is confronted with in high
• i =1
( ) dimensional spaces.
1 N2
x −x
N 2 hl

i =1
( i
h
)
57 58

❖57 ❖58

❖ NAIVE – BAYES CLASSIFIER

❖ K Nearest Neighbor Density Estimation
➢ Let x   and the goal is to estimate p ( x |  i )


i = 1, 2, …, M. For a “good” estimate of the pdf ➢ In Parzen:

one would need, say, Nℓ points. • The volume is constant
• The number of points in the volume is varying
➢ Assume x1, x2 ,…, xℓ mutually independent. Then:
➢ Now:
p(x | i ) =  p(x j | i )


• Keep the number of points kN = k

j =1 constant
➢ In this case, one would require, roughly, N points
for each pdf. Thus, a number of points of the • Leave the volume to be varying
order N·ℓ would suffice. k
• pˆ ( x) =
NV ( x)
➢ It turns out that the Naïve – Bayes classifier
works reasonably well even in cases that violate
the independence assumption. 59 60

❖59 ❖60

❖10
❖10/06/2022

❖ The Nearest Neighbor Rule

➢ Choose k out of the N training vectors, identify the k
nearest ones to x

• ➢ Out of these k identify ki that belong to class ωi

➢ Assign x →  i : ki  k j i  j

➢ The simplest version

k=1 !!!
k
N1V1 N 2V2 ➢ For large N this is not bad. It can be shown that:
= () if PB is the optimal Bayesian error probability, then:
k N1V1
M
N 2V2 PB  PNN  PB (2 − PB )  2 PB
M −1
61 62

❖61 ❖62

❖ Voronoi tesselation

2 PNN
➢ PB  PkNN  PB +
k

➢ k → , PkNN → PB

➢ For small PB:

PNN  2 PB
P3 NN  PB + 3( PB ) 2 Ri = x : d ( x, x i )  d ( x, x j ) i  j
63 64

❖63 ❖64

➢ Assume now that the conditional dependence for A6 = x5 , x4   x5 ,..., x1
each xi is limited to a subset of the features
appearing in each of the product terms. That is: ➢ The above is a generalization of the Naïve – Bayes.
 For the Naïve – Bayes the assumption is:
p( x1 , x2 ,..., x ) = p( x1 )   p( xi | Ai ) Ai = Ø, for i=1, 2, …, ℓ
i =2
where
Ai  xi −1 , xi −2 ,..., x1

65 66

❖65 ❖66

❖11
❖10/06/2022

❖ Bayesian Networks
➢ A graphical way to portray conditional dependencies
is given below ➢ Definition: A Bayesian Network is a directed acyclic
graph (DAG) where the nodes correspond to random
➢ According to this figure we variables. Each node is associated with a set of
have that:
conditional probabilities (densities), p(xi|Ai), where xi
• x6 is conditionally dependent on
x4, x5. is the variable associated with the node and Ai is the
• x5 on x4 set of its parents in the graph.
• x4 on x1, x2
• x3 on x2
➢ A Bayesian Network is specified by:
• x1, x2 are conditionally
independent on other variables. • The marginal probabilities of its root nodes.
• The conditional probabilities of the non-root nodes,
given their parents, for ALL possible combinations.
➢ For this case:
p( x1 , x2 ,..., x6 ) = p( x6 | x5 , x4 )  p( x5 | x4 )  p( x3 | x2 )  p( x2 )  p( x1 )
67 68

❖67 ❖68

➢ Once a DAG has been constructed, the joint

➢ The figure below is an example of a Bayesian probability can be obtained by multiplying the
Network corresponding to a paradigm from the marginal (root nodes) and the conditional (non-root
medical applications field. nodes) probabilities.
➢ This Bayesian network
models conditional
➢ Training: Once a topology is given, probabilities are
dependencies for an
estimated via the training data set. There are also
example concerning
methods that learn the topology.
smokers (S),
tendencies to develop
cancer (C) and heart ➢ Probability Inference: This is the most common task
disease (H), together that Bayesian networks help us to solve efficiently.
with variables Given the values of some of the variables in the
corresponding to heart graph, known as evidence, the goal is to compute
(H1, H2) and cancer the conditional probabilities for some of the other
(C1, C2) medical tests. variables, given the evidence.

69 70

❖69 ❖70

❖ Example: Consider the Bayesian network of the ➢ For a), a set of calculations are required that
figure: propagate from node x to node w. It turns out that
P(w0|x1) = 0.63.

➢ For b), the propagation is reversed in direction. It

turns out that P(x0|w1) = 0.4.

➢ In general, the required inference information is

a) If x is measured to be x=1 (x1), compute computed via a combined process of “message
P(w=0|x=1) [P(w0|x1)]. passing” among the nodes of the DAG.

b) If w is measured to be w=1 (w1) compute

P(x=0|w=1) [ P(x0|w1)]. ❖Complexity:
➢ For singly connected graphs, message passing
algorithms amount to a complexity linear in the
71 number of nodes. 72

❖71 ❖72

❖12

Optimum Statistical Classifiers
100% (1)
Optimum Statistical Classifiers
12 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
76 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
44 pages
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Bayesian Decision Theory: CS479/679 Pattern Recognition Dr. George Bebis
64 pages
Lecture 6 - Generative Models
No ratings yet
Lecture 6 - Generative Models
33 pages
DHSCH 2 Part 3
No ratings yet
DHSCH 2 Part 3
22 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
17 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
Lec 2
No ratings yet
Lec 2
37 pages
PR January20 03 PDF
No ratings yet
PR January20 03 PDF
74 pages
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
No ratings yet
Bayesian Classifiers: Lectured by Ha Hoang Kha, Ph.D. Ho Chi Minh City University of Technology
31 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture3
58 pages
Linearclassification
No ratings yet
Linearclassification
31 pages
Theory For Classification and Linear Models (I)
No ratings yet
Theory For Classification and Linear Models (I)
32 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
Lecture 2 3
No ratings yet
Lecture 2 3
72 pages
03 Classification Methods
No ratings yet
03 Classification Methods
37 pages
Pattern Classification
No ratings yet
Pattern Classification
39 pages
Machine Learning 04 - Bayes
No ratings yet
Machine Learning 04 - Bayes
35 pages
Bayesian Decision Theory: Intro To
No ratings yet
Bayesian Decision Theory: Intro To
56 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Kuliah 3 Teori Keputusan Bayes Bag 2
No ratings yet
Kuliah 3 Teori Keputusan Bayes Bag 2
30 pages
Bayes Decision Theory
No ratings yet
Bayes Decision Theory
53 pages
Problem Sheet 1 Answers
No ratings yet
Problem Sheet 1 Answers
4 pages
Pattern Classification
No ratings yet
Pattern Classification
141 pages
Lec 6
No ratings yet
Lec 6
14 pages
Pattern Classification
No ratings yet
Pattern Classification
39 pages
PR January20 04 PDF
No ratings yet
PR January20 04 PDF
40 pages
Chapter 4
No ratings yet
Chapter 4
57 pages
Statistical Perspective
No ratings yet
Statistical Perspective
85 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Bayesian Theory
No ratings yet
Bayesian Theory
66 pages
Lecturer4 - Bayesian Decision Theory
No ratings yet
Lecturer4 - Bayesian Decision Theory
40 pages
Bayesian
No ratings yet
Bayesian
21 pages
Bayes&Voice Recognition
No ratings yet
Bayes&Voice Recognition
76 pages
Lec 1
No ratings yet
Lec 1
42 pages
Advancing Students Computational Thinking Skills Through Educational Robotics PDF
No ratings yet
Advancing Students Computational Thinking Skills Through Educational Robotics PDF
10 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
Bayesian Decision Theory: Prof. Richard Zanibbi
No ratings yet
Bayesian Decision Theory: Prof. Richard Zanibbi
47 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
63 pages
DHSCH 2 Part 2
No ratings yet
DHSCH 2 Part 2
16 pages
Pattern Classification PDF
No ratings yet
Pattern Classification PDF
39 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
17 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
16 pages
Bayes
No ratings yet
Bayes
10 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
80 pages
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
100% (1)
Pattern Recognition Lecture Bayes Decision Theory: Prof. Dr. Marcin Grzegorzek
35 pages
Master Thesis Computer Engineering PDF
100% (2)
Master Thesis Computer Engineering PDF
5 pages
Message Authentication Codes
No ratings yet
Message Authentication Codes
18 pages
Pattern Classification
No ratings yet
Pattern Classification
39 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Et Module5
No ratings yet
Et Module5
11 pages
GIS Project Report
100% (1)
GIS Project Report
22 pages
Class IX Computer Notes
No ratings yet
Class IX Computer Notes
39 pages
Bird 4391a Manual
No ratings yet
Bird 4391a Manual
47 pages
TSoM Evaluation Kit User Manual Revb
No ratings yet
TSoM Evaluation Kit User Manual Revb
63 pages
OOP - Graphical User Interface (GUI) and Event Handling
No ratings yet
OOP - Graphical User Interface (GUI) and Event Handling
27 pages
Unit 1
No ratings yet
Unit 1
67 pages
Flask
No ratings yet
Flask
29 pages
Polargy-Catalog - Cointaiment
No ratings yet
Polargy-Catalog - Cointaiment
18 pages
Installation Manual For DM100 VDR and DM100 S-VDR G2 - DBS10956-21
0% (1)
Installation Manual For DM100 VDR and DM100 S-VDR G2 - DBS10956-21
133 pages
3110 & 3120 RTD
No ratings yet
3110 & 3120 RTD
24 pages
The Coral Island Robert Michael Ballantyne Download
No ratings yet
The Coral Island Robert Michael Ballantyne Download
17 pages
Oil & Gas Overfill Prevention
No ratings yet
Oil & Gas Overfill Prevention
4 pages
Worksheets 01 7b Varietyoflife
No ratings yet
Worksheets 01 7b Varietyoflife
11 pages
Test Hall Ticket 1101 02816 251222 0002: Registration Number
No ratings yet
Test Hall Ticket 1101 02816 251222 0002: Registration Number
1 page
Fujitsu G60 Multi-Cassette Currency Recycling Module: Data Sheet
No ratings yet
Fujitsu G60 Multi-Cassette Currency Recycling Module: Data Sheet
3 pages
BROSUR INFINIX (1450 X 2100 Piksel) (2100 X 1450 Piksel) - 20240420 - 141150 - 0000
No ratings yet
BROSUR INFINIX (1450 X 2100 Piksel) (2100 X 1450 Piksel) - 20240420 - 141150 - 0000
1 page
"Event Table" Logging Transactions Natively On Snowflake - by Somen Swain - Snowflake - Aug - 2023 - Medium
No ratings yet
"Event Table" Logging Transactions Natively On Snowflake - by Somen Swain - Snowflake - Aug - 2023 - Medium
4 pages
ChaitravbAutomation Resume
No ratings yet
ChaitravbAutomation Resume
3 pages
CAR Protocol
No ratings yet
CAR Protocol
16 pages
Máquina Tridimensional Horizon CMM Datasheet
No ratings yet
Máquina Tridimensional Horizon CMM Datasheet
2 pages
Saes T 566 PDF
No ratings yet
Saes T 566 PDF
9 pages
Patra Enterprise Fssai Lisence
No ratings yet
Patra Enterprise Fssai Lisence
4 pages
Affidavit of Discrepancy
No ratings yet
Affidavit of Discrepancy
1 page
ST 2
No ratings yet
ST 2
5 pages
Xbox 360StyleGuide Latest
No ratings yet
Xbox 360StyleGuide Latest
97 pages
NORSOK N-001 2010 Integrity of Offshore Structures
No ratings yet
NORSOK N-001 2010 Integrity of Offshore Structures
30 pages

Introduction To Pattern Recognition

Uploaded by

Introduction To Pattern Recognition

Uploaded by

❖10/06/2022

❖ Features: These are measurable quantities obtained from

❖Feature vectors: A number of features

Feature vectors are treated as random vectors.

❖ The classifier consists of a set of functions, whose values,

CLASSIFIERS BASED ON BAYES DECISION

❖ Statistical nature of feature vectors ❖ Computation of a-posteriori probabilities

❖ The Bayes classification rule (for two classes M=2)

The Bayes rule (Μ=2)

➢ For equiprobable classes the test becomes

❖ Equivalently in words: Divide space in two regions

R1 (→ 1 ) and R2 (→ 2 ) ❖ Bayesian classifier is OPTIMAL with respect to

❖ The Bayes classification rule for many (M>2) classes:

➢ Such a choice also minimizes the classification error

❖ Minimizing the average risk

➢ Indeed: Moving the threshold the total shaded

12 penalty term for deciding class 2 ,

➢ Risk with respect to 1 ➢ Average risk

❖ Choose R1 and R2 so that r is minimized

❖ An example: ➢ Then the threshold value is:

is the surface separating the regions. On one side is

BAYESIAN CLASSIFIER FOR NORMAL

❖ Multivariate Gaussian pdf

❖ g i ( x)  f ( P(i x)) is a discriminant function

➢ g i ( x) = ln( p( x i ) P(i )) = That is, gi (x) is quadratic and the surfaces

❖ Decision Hyperplanes ➢ Let in addition:

➢ Nondiagonal:    2 ❖ Minimum Distance Classifiers

• Classify x → 1. Observe that d E ,2  d E ,1

ESTIMATION OF UNKNOWN PROBABILITY

which is known as the Likelihood of  w.r. to X

If, indeed, there is a  0 such that

Asymptotically unbiased and consistent

ˆ MAP = arg max p( X ) or

p( x X ) =  p ( x  ) p( X )d  ➢ The above is a sequence of Gaussians as N → 

A bit more insight vi a an example

➢ Example: x is nonzero in the interval x1  x  x2 ❖ Mixture Models

➢ The Expectation-Maximisation (EM) algorithm. • What we need is to compute

which are not observed directly.

a many to one transformation

➢ The algorithm: • Unknown parameters

• Complete data ( x k , jk ), k = 1,2,..., N 

• Observed data x k , k = 1,2,..., N • M-step

➢ Define ➢ Mean value

 ( x)  0,   ( x)d x = 1 53 Hence unbiased in the limit 54

➢ Variance h=0.1, N=10000

h=0.1, N=1000 h=0.8, N=1000

➢The higher the N the better the accuracy

❖ NAIVE – BAYES CLASSIFIER

i = 1, 2, …, M. For a “good” estimate of the pdf ➢ In Parzen:

• Keep the number of points kN = k

❖ The Nearest Neighbor Rule

• ➢ Out of these k identify ki that belong to class ωi

➢ The simplest version

➢ For small PB:

➢ Once a DAG has been constructed, the joint

➢ For b), the propagation is reversed in direction. It

➢ In general, the required inference information is

b) If w is measured to be w=1 (w1) compute

You might also like