0% found this document useful (0 votes)
50 views33 pages

Hidden Markov Model HMM

Hidden Markov models (HMMs) are graphical models used to model sequentially ordered data. They consist of hidden states that transition between each other according to probabilistic transition rules, and observed states that are dependent on their corresponding hidden state. HMMs are defined by their hidden and observed states, initial state probabilities, transition probabilities between hidden states, and observation probabilities. They can be used for inference tasks like computing the probability of an observation sequence or finding the most likely hidden state sequence that produced an observed sequence. The forward-backward and Viterbi algorithms provide efficient solutions for these inference problems using dynamic programming.

Uploaded by

Osama Al Asoouli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views33 pages

Hidden Markov Model HMM

Hidden Markov models (HMMs) are graphical models used to model sequentially ordered data. They consist of hidden states that transition between each other according to probabilistic transition rules, and observed states that are dependent on their corresponding hidden state. HMMs are defined by their hidden and observed states, initial state probabilities, transition probabilities between hidden states, and observation probabilities. They can be used for inference tasks like computing the probability of an observation sequence or finding the most likely hidden state sequence that produced an observed sequence. The forward-backward and Viterbi algorithms provide efficient solutions for these inference problems using dynamic programming.

Uploaded by

Osama Al Asoouli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Hidden Markov Models

David Meir Blei


November 1, 1999
What is an HMM?

• Graphical Model
• Circles indicate states
• Arrows indicate probabilistic dependencies
between states
What is an HMM?

• Green circles are hidden states


• Dependent only on the previous state
• “The past is independent of the future given the
present.”
What is an HMM?

• Purple nodes are obser ved states


• Dependent only on their corresponding hidden
state
HMM Formalism
S S S S S

K K K K K

• {S, K, Π, Α, Β}
• S : {s1…sN } are the values for the hidden states
• K : {k1…kM } are the values for the observations
HMM Formalism
S A S A S A S A S

B B B
K K K K K

• {S, K, Π, Α, Β}
• Π = {πι} are the initial state probabilities
• A = {aij} are the state transition probabilities
• B = {bik} are the observation state probabilities
Inference in an HMM

• Compute the probability of a given observation


sequence
• Given an observation sequence, compute the most
likely hidden state sequence
• Given an observation sequence and set of possible
models, which model most closely fits the data?
Decoding

o1 ot-1 ot ot+1 oT

Given an observation sequence and a model,


compute the probability of the observation sequence

O = (o1...oT ), µ = ( A, B, Π )
Compute P(O | µ )
Decoding
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

P (O | X , µ ) = bx1o1 bx2o2 ...bxT oT


Decoding
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

P (O | X , µ ) = bx1o1 bx2o2 ...bxT oT


P( X | µ ) = π x1 a x1x2 a x2 x3 ...a xT −1xT
Decoding
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

P (O | X , µ ) = bx1o1 bx2o2 ...bxT oT


P( X | µ ) = π x1 a x1x2 a x2 x3 ...a xT −1xT
P (O, X | µ ) = P (O | X , µ ) P( X | µ )
Decoding
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

P (O | X , µ ) = bx1o1 bx2o2 ...bxT oT


P( X | µ ) = π x1 a x1x2 a x2 x3 ...a xT −1xT
P (O, X | µ ) = P (O | X , µ ) P( X | µ )
P(O | µ ) = ∑ P(O | X , µ ) P( X | µ )
X
Decoding
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

T −1
P (O | µ ) = ∑π
{ x1 ... xT }
b
x1 x1o1 Πa
t =1
b
xt xt +1 xt +1ot +1
Forward Procedure
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

• Special structure gives us an efficient solution


using dynamic programming.
• Intuition: Probability of the first t observations is
the same for all possible t+1 length state
sequences.
• Define: α (t ) = P(o ...o , x = i | µ )
i 1 t t
Forward Procedure
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

α j (t + 1)

= P(o1...ot +1 , xt +1 = j )
= P(o1...ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot | xt +1 = j ) P(ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot , xt +1 = j ) P(ot +1 | xt +1 = j )
Forward Procedure
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

α j (t + 1)

= P(o1...ot +1 , xt +1 = j )
= P(o1...ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot | xt +1 = j ) P(ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot , xt +1 = j ) P(ot +1 | xt +1 = j )
Forward Procedure
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

α j (t + 1)

= P(o1...ot +1 , xt +1 = j )
= P(o1...ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot | xt +1 = j ) P(ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot , xt +1 = j ) P(ot +1 | xt +1 = j )
Forward Procedure
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

α j (t + 1)

= P(o1...ot +1 , xt +1 = j )
= P(o1...ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot | xt +1 = j ) P(ot +1 | xt +1 = j ) P( xt +1 = j )
= P(o1...ot , xt +1 = j ) P(ot +1 | xt +1 = j )
Forward Procedure
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

= ∑ P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )

= ∑ P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )

= ∑ P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )

= ∑α (t )a b
i =1... N
i ij jot +1
Forward Procedure
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

= ∑ P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )

= ∑ P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )

= ∑ P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )

= ∑α (t )a b
i =1... N
i ij jot +1
Forward Procedure
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

= ∑ P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )

= ∑ P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )

= ∑ P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )

= ∑α (t )a b
i =1... N
i ij jot +1
Forward Procedure
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

= ∑ P(o ...o , x
i =1... N
1 t t = i, xt +1 = j )P(ot +1 | xt +1 = j )

= ∑ P(o ...o , x
i =1... N
1 t t +1 = j | xt = i )P( xt = i ) P(ot +1 | xt +1 = j )

= ∑ P(o ...o , x
i =1... N
1 t t = i )P( xt +1 = j | xt = i ) P(ot +1 | xt +1 = j )

= ∑α (t )a b
i =1... N
i ij jot +1
Backward Procedure
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

β i (T + 1) = 1
β i (t ) = P (ot ...oT | xt = i ) Probability of the rest
of the states given the
β i (t ) = ∑a b
j =1... N
ij iot β j (t + 1) first state
Decoding Solution
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

N
P(O | µ ) = ∑ α i (T ) Forward Procedure
i =1
N
P(O | µ ) = ∑ π i β i (1) Backward Procedure
i =1
N
P(O | µ ) = ∑ α i (t )β i (t ) Combination
i =1
Best State Sequence

o1 ot-1 ot ot+1 oT

• Find the state sequence that best explains the observations

• Viterbi algorithm

• arg max P( X | O)
X
Viterbi Algorithm
x1 xt-1 j

o1 ot-1 ot ot+1 oT

δ j (t ) = max P( x1...xt −1 , o1...ot −1 , xt = j , ot )


x1 ... xt −1

The state sequence which maximizes the


probability of seeing the observations to time
t-1, landing in state j, and seeing the
observation at time t
Viterbi Algorithm
x1 xt-1 xt xt+1

o1 ot-1 ot ot+1 oT

δ j (t ) = max P( x1...xt −1 , o1...ot −1 , xt = j , ot )


x1 ... xt −1

δ j (t + 1) = max δ i (t )aij b jo t +1
i Recursive
Computation
ψ j (t + 1) = arg max δ i (t )aij b jo t +1
i
Viterbi Algorithm
x1 xt-1 xt xt+1 xT

o1 ot-1 ot ot+1 oT

Xˆ T = arg max δ i (T ) Compute the most


i
likely state sequence
Xˆ t = ψ ^ (t + 1) by working
X t +1
backwards
P( Xˆ ) = arg max δ i (T )
i
Parameter Estimation
A A A A

B B B B B
o1 ot-1 ot ot+1 oT

• Given an observation sequence, find the model


that is most likely to produce that sequence.
• No analytic method
• Given a model and observation sequence, update
the model parameters to better fit the observations.
Parameter Estimation
A A A A

B B B B B
o1 ot-1 ot ot+1 oT

α i (t )aij b jo β j (t + 1)
pt (i, j ) = t +1
Probability of
∑α m (t ) β m (t )
m =1... N
traversing an arc

γ i (t ) = ∑ p (i, j )
j =1... N
t
Probability of
being in state i
Parameter Estimation
A A A A

B B B B B
o1 ot-1 ot ot+1 oT

πˆ i = γ i (1)

T
p (i, j ) Now we can
= t =1 t
aˆij
∑ γ (t )
T compute the new
t =1 i estimates of the

bˆik =
∑ γ (i )
{t :ot = k } t
model parameters.

∑ γ (t )
T
t =1 i
HMM Applications

• Generating parameters for n-gram models


• Tagging speech
• Speech recognition
The Most Important Thing
A A A A

B B B B B
o1 ot-1 ot ot+1 oT

We can use the special structure of this


model to do a lot of neat math and solve
problems that are otherwise not solvable.

You might also like