0% found this document useful (0 votes)

22 views68 pages

Session 6-Markov Slide

Uploaded by

TOÀN VÕ VĂN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views68 pages

Session 6-Markov Slide

Uploaded by

TOÀN VÕ VĂN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY

Hidden Markov Model & Viterbi

Lecturer: Dr. Bùi Thanh Hùng

Data Science Department
Faculty of Information Technology
Industrial University of Ho Chi Minh city
Email: [email protected]
Website: https://sites.google.com/site/hungthanhbui1980/
Examples
Examples
Examples
Examples
Examples
Examples
Examples
Examples
Examples
Examples
Examples
Examples
Examples
Hidden Markov Model
Hidden Markov Model
Hidden Markov Model
Markoviance Properties of State Sequences
HMM Formalism
S S S S S

K K K K K

• {S, K,   
• S : {s1…sN } are the values for the hidden states
• K : {k1…kM } are the values for the observations
HMM Formalism
A A A A
S S S S S

B B B

K K K K K

{S, K,   
• S : {s1…sN } are the values for the hidden states
• K : {k1…kM } are the values for the observations
•  =  are the initial state probabilities
• A = {aij} are the state transition probabilities
• B = {bik} are the observation state probabilities
Inference in an HMM

• Compute the probability of a given observation

sequence
• Given an observation sequence, compute the most
likely hidden state sequence
• Given an observation sequence and set of possible
models, which model most closely fits the data?
Decoding

o1 ot-1 ot ot+1 oT

Given an observation sequence and a model, compute the

probability of the observation sequence
O = (o1...oT ),  = ( A, B,  )
Compute P(O |  )
 =  are the initial state probabilities
A = {aij} are the state transition probabilities
B = {bik} are the observation state probabilities
Decoding

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

P(O | X ,  ) = bx1o1 bx2o2 ...bxT oT

Decoding

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

P(O | X ,  ) = bx1o1 bx2o2 ...bxT oT

P( X |  ) =  x1 ax1x2 ax2 x3 ...axT −1xT
Decoding

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

P(O | X ,  ) = bx1o1 bx2o2 ...bxT oT

P( X |  ) =  x1 ax1x2 ax2 x3 ...axT −1xT
P(O, X |  ) = P(O | X ,  ) P( X |  )
Decoding

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

P(O | X ,  ) = bx1o1 bx2o2 ...bxT oT

P( X |  ) =  x1 ax1x2 ax2 x3 ...axT −1xT
P(O, X |  ) = P(O | X ,  ) P( X |  )
P(O |  ) =  P(O | X ,  ) P( X |  )
X
Decoding

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

T −1
P(O |  ) = 
{ x1 ... xT }
b
x1 x1o1 a
t =1
b
xt xt +1 xt +1ot +1
Forward Procedure

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

• Special structure gives us an efficient solution using

dynamic programming.
• Intuition: Probability of the first t observations is the
same for all possible t+1 length state sequences.
• Define:
 i (t ) = P(o1...ot , xt = i |  )
Forward Procedure

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

 j (t + 1)

= P(o1...ot +1 , xt +1 = j )
= P(o1...ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot | xt +1 = j ) P(ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot , xt +1 = j ) P(ot +1 | xt +1 = j )
Forward Procedure

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

 j (t + 1)

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

 j (t + 1)

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

 j (t + 1)

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

=  P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )

=  P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )

=  P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )

=  (t )a b
i =1... N
i ij jot +1
Forward Procedure

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

=  P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )

=  P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )

=  P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )

=  (t )a b
i =1... N
i ij jot +1
Forward Procedure

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

=  P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )

=  P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )

=  P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )

=  (t )a b
i =1... N
i ij jot +1
Forward Procedure

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

=  P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )

=  P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )

=  P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )

=  (t )a b
i =1... N
i ij jot +1
Backward Procedure

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

 i (T + 1) = 1
 i (t ) = P(ot ...oT | xt = i) Probability of the rest
of the states given the
 i (t ) = a b
j =1... N
ij iot  j (t + 1) first state
Decoding Solution

x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

N
P(O |  ) =  i (T ) Forward Procedure
i =1
N
P(O |  ) =   i  i (1) Backward Procedure
i =1
N
P(O |  ) =  i (t ) i (t ) Combination
i =1
Best State Sequence

o1 ot-1 ot ot+1 oT

• Find the state sequence that best explains the

observations

• Viterbi algorithm
arg max P( X | O)
X
Viterbi

The Viterbi algorithm is named after Andrew Viterbi,

who proposed it in 1967 as a decoding algorithm
for convolutional codes over noisy digital
communication links. It has, however, a history
of multiple invention, with at least seven
independent discoveries, including those by
Viterbi, Needleman and Wunsch, and Wagner and
Fischer.
Viterbi

The Viterbi algorithm is a dynamic

programming algorithm for finding the
most likely sequence of hidden states—called
the Viterbi path—that results in a sequence of
observed events, especially in the context
of Markov information sources and hidden Markov
models (HMM).
Viterbi
The algorithm has found universal application in decoding
the convolutional codes used in both CDMA and GSM digital
cellular, dial-up modems, satellite, deep-space communications,
and 802.11 wireless LANs. It is now also commonly used in speech
recognition, speech synthesis, diarization, keyword
spotting, computational linguistics, and bioinformatics.

For example, in speech-to-text (speech recognition), the acoustic

signal is treated as the observed sequence of events, and a string of
text is considered to be the "hidden cause" of the acoustic signal. The
Viterbi algorithm finds the most likely string of text given the acoustic
signal.
Viterbi Algorithm
x1 xt-1 j

o1 ot-1 ot ot+1 oT

 j (t ) = max P( x1...xt −1 , o1...ot −1 , xt = j, ot )

x1 ... xt −1

The state sequence which maximizes the

probability of seeing the observations to time
t-1, landing in state j, and seeing the
observation at time t
Viterbi Algorithm
x1 xt-1 xt xt+1

o1 ot-1 ot ot+1 oT

 j (t ) = max P( x1...xt −1 , o1...ot −1 , xt = j, ot )

x1 ... xt −1

 j (t + 1) = max  i (t )aij b jo t +1
i Recursive
Computation
 j (t + 1) = arg max  i (t )aij b jo t +1
i
Viterbi Algorithm
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

Xˆ T = arg max  i (T ) Compute the most

i
likely state sequence
Xˆ t =  ^ (t + 1) by working
X t +1
backwards
P( Xˆ ) = arg max  i (T )
i
Parameter Estimation

A A A A

B B B B B
o1 ot-1 ot ot+1 oT

• Given an observation sequence, find the model that is most

likely to produce that sequence.
• No analytic method
• Given a model and observation sequence, update the model
parameters to better fit the observations.
Parameter Estimation

A A A A

B B B B B
o1 ot-1 ot ot+1 oT

 i (t )aij b jo  j (t + 1)
pt (i, j ) = t +1
Probability of
 m (t )  m (t )
m =1... N
traversing an arc

 i (t ) =  p (i, j)
j =1... N
t
Probability of
being in state i
Parameter Estimation

A A A A

B B B B B
o1 ot-1 ot ot+1 oT

ˆi =  i (1)

T
p (i, j ) Now we can
= t =1 t
aˆij
  (t )
T compute the new
t =1 i estimates of the

bˆik =
  (i )
{t:ot = k } t
model parameters.

  (t )
T
t =1 i
Viterbi
Viterbi
Viterbi
Viterbi
obs = ('normal', 'cold', 'dizzy')
states = ('Healthy', 'Fever')
start_p = {'Healthy': 0.6, 'Fever': 0.4}
trans_p = {
'Healthy' : {'Healthy': 0.7, 'Fever': 0.3},
'Fever' : {'Healthy': 0.4, 'Fever': 0.6}
}
emit_p = {
'Healthy' : {'normal': 0.5, 'cold': 0.4, 'dizzy': 0.1},
'Fever' : {'normal': 0.1, 'cold': 0.3, 'dizzy': 0.6}
}
Viterbi

day 1: normal
day 2: cold
day 3: dizzy
day 1: normal Viterbi
day 2: cold
day 3: dizzy