0% found this document useful (0 votes)
9 views9 pages

DHSCH 3 Part 3

This document summarizes key aspects of hidden Markov models from chapter 3 of the book Pattern Classification. It discusses that a hidden Markov model extends Markov chains to model systems with visible and hidden states. It then outlines the three main problems associated with hidden Markov models: 1) evaluating the probability of an observed sequence, 2) decoding to find the most likely hidden state sequence, and 3) learning model parameters through iterative procedures like Baum-Welch to maximize the probability of observations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

DHSCH 3 Part 3

This document summarizes key aspects of hidden Markov models from chapter 3 of the book Pattern Classification. It discusses that a hidden Markov model extends Markov chains to model systems with visible and hidden states. It then outlines the three main problems associated with hidden Markov models: 1) evaluating the probability of an observed sequence, 2) decoding to find the most likely hidden state sequence, and 3) learning model parameters through iterative procedures like Baum-Welch to maximize the probability of observations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 9

Pattern

Classification

All materials in these slides were taken


from
Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John
Wiley & Sons, 2000
with the permission of the authors and
the publisher
Chapter 3 (Part 3):
Maximum-Likelihood and Bayesian
Parameter Estimation (Section 3.10)

• Hidden Markov Model: Extension of


Markov Chains
3

• Hidden Markov Model (HMM)


• Interaction of the visible states with the hidden states
bjk= 1 for all j where bjk=P(Vk(t) | j(t)).

• 3 problems are associated with this model


• The evaluation problem
• The decoding problem
• The learning problem
Pattern Classification, Chapter 3 (Part 3)
4

• The evaluation problem


It is the probability that the model produces a
sequence VT of visible states. It is:
rmax
P ( V T )   P ( V T |  rT )P ( rT )
r 1

where each r indexes a particular sequence


 rT   ( 1 ), ( 2 ),..., ( T ) of T hidden states.
t T
(1) P ( V T |  rT )   P ( v ( t ) |  ( t ))
t 1
t T
(2) P(  )   P (  ( t ) |  ( t  1 ))
T
r
t 1
Pattern Classification, Chapter 3 (Part 3)
5
Using equations (1) and (2), we can write:
rmax t T
P ( V )   P ( v ( t ) |  ( t )) P (  ( t ) |  ( t  1 )
T

r 1 t 1

Interpretation: The probability that we observe the particular sequence of T


visible states VT is equal to the sum over all rmax possible sequences of hidden
states of the conditional probability that the system has made a particular
transition multiplied by the probability that it then emitted the visible symbol in
our target sequence.

Example: Let 1, 2, 3 be the hidden states; v1, v2, v3 be the visible states
and V3 = {v1, v2, v3} is the sequence of visible states

P({v1, v2, v3}) = P(1).P(v1 | 1).P(2 | 1).P(v2 | 2).P(3 | 2).P(v3 | 3)


+…+ (possible terms in the sum= all possible (33= 27) cases
!)

Pattern Classification, Chapter 3 (Part 3)


6
v1 v2 v3
First possibility:
1 2 3
(t = 1) (t = 2) (t = 3)

v1 v2 v3
Second Possibility:

2 3 1
(t = 1) (t = 2) (t = 3)

P({v1, v2, v3}) = P(2).P(v1 | 2).P(3 | 2).P(v2 | 3).P(1 | 3).P(v3 | 1) + …+

Therefore:
t 3
P ({ v 1 , v 2 , v 3 })  
possible sequence
 P ( v ( t ) |  ( t )).P (  ( t ) |  ( t  1 ))
t 1
of hidden states

Pattern Classification, Chapter 3 (Part 3)


7

• The decoding problem (optimal state sequence)


Given a sequence of visible states VT, the decoding
problem is to find the most probable sequence of hidden
states.

This problem can be expressed mathematically as:


find the single “best” state sequence (hidden states)
ˆ ( 1 ),ˆ ( 2 ),...,ˆ ( T ) such that :
ˆ ( 1 ),ˆ ( 2 ),...,ˆ ( T )  arg max P  ( 1 ), ( 2 ),..., ( T ), v ( 1 ), v ( 2 ),...,V ( T ) |  
 ( 1 ), ( 2 ),..., ( T )

Note that the summation disappeared, since we want to find


Only one unique best case !
Pattern Classification, Chapter 3 (Part 3)
8

Where:  = [,A,B]
 = P((1) = ) (initial state probability)
A = aij = P((t+1) = j | (t) = i)
B = bjk = P(v(t) = k | (t) = j)

In the preceding example, this computation corresponds to


the selection of the best path amongst:

{1(t = 1),2(t = 2),3(t = 3)}, {2(t = 1),3(t = 2),1(t = 3)}


{3(t = 1),1(t = 2),2(t = 3)}, {3(t = 1),2(t = 2),1(t = 3)}
{2(t = 1),1(t = 2),3(t = 3)}
Pattern Classification, Chapter 3 (Part 3)
9

• The learning problem (parameter estimation)


This third problem consists of determining a method to
adjust the model parameters  = [,A,B] to satisfy a
certain optimization criterion. We need to find the best
model
ˆ  [ ˆ , Â , B̂ ]
Such that to maximize the probability of the observation
sequence:
Max P ( V |  )
T

We use an iterative procedure such as Baum-Welch or


Gradient to find this local optimum

Pattern Classification, Chapter 3 (Part 3)

You might also like