Hidden Markov Models: Adapted From
Hidden Markov Models: Adapted From
Models
Adapted from
Dr Catherine Sweeney-Reeds
slides
Summary
Introduction
Description
Central problems in HMM modelling
Extensions
Demonstration
Specification of an HMM
N - number of states
Q = {q
1
; q
2
; : : : ;q
T
} - set of states
M - the number of symbols (observables)
O = {o
1
; o
2
; : : : ;o
T
} - set of symbols
Description
Specification of an HMM
A - the state transition probability matrix
aij = P(q
t+1
= j|q
t
= i)
B- observation probability distribution
b
j
(k) = P(o
t
= k|q
t
= j) i k M
- the initial state distribution
Description
Specification of an HMM
Full HMM is thus specified as a triplet:
= (A,B,)
Description
Central problems in HMM
modelling
Problem 1
Evaluation:
Probability of occurrence of a particular
observation sequence, O = {o
1
,,o
k
}, given
the model
P(O|)
Complicated hidden states
Useful in sequence classification
Central
problems
Central problems in HMM
modelling
Problem 2
Decoding:
Optimal state sequence to produce given
observations, O = {o
1
,,o
k
}, given model
Optimality criterion
Useful in recognition problems
Central
problems
Central problems in HMM
modelling
Problem 3
Learning:
Determine optimum model, given a training
set of observations
Find , such that P(O|) is maximal
Central
problems
Problem 1: Nave solution
State sequence Q = (q
1
,q
T
)
Assume independent observations:
) ( )... ( ) ( ) , | ( , | (
2 2 1 1
1
T qT q q t
T
i
t
o b o b o b q o P q O P = = )
[
=
Central
problems
NB Observations are mutually independent, given the
hidden states. (Joint distribution of independent
variables factorises into marginal distributions of the
independent variables.)
Problem 1: Nave solution
Observe that :
And that:
qT qT q q q q q
a a a q P
1 3 2 2 1 1
... ) | (
= t
Central
problems
=
q
q P q O P O P ) | ( ) , | ( ) | (
Problem 1: Nave solution
Finally get:
Central
problems
=
q
q P q O P O P ) | ( ) , | ( ) | (
NB:
-The above sum is over all state paths
-There are N
T
states paths, each costing
O(T) calculations, leading to O(TN
T
)
time complexity.
Problem 1: Efficient solution
Define auxiliary forward variable :
Central
problems
) , | ,..., ( ) (
1
o i q o o P i
t t t
= =
t
(i) is the probability of observing a partial sequence of
observables o
1
,o
t
such that at time t, state q
t
=i
Forward algorithm:
Problem 1: Efficient solution
Recursive algorithm:
Initialise:
Calculate:
Obtain:
) ( ) (
1 1
o b i
i i
t o =
Central
problems
) ( ] ) ( [ ) (
1
1
1 +
=
+
=
t j
N
i
ij t t
o b a i j o o
=
=
N
i
T
i O P
1
) ( ) | ( o
Complexity is O(N
2
T)
(Partial obs seq to t AND state i at t)
x (transition to j at t+1) x (sensor)
Sum of different ways
of getting obs seq
Sum, as can reach j from
any preceding state
o incorporates partial obs seq to t
Problem 1: Alternative solution
Define auxiliary
forward variable :
Central
problems
Backward algorithm:
) , | ,..., , ( ) (
2 1
| i q o o o P i
t T t t t
= =
+ +
|
t
(i) the probability of observing a sequence of
observables o
t+1
,,o
T
given state q
t
=i at time t, and
Problem 1: Alternative solution
Recursive algorithm:
Initialise:
Calculate:
Terminate:
1 ) ( = j
T
|
Central
problems
Complexity is O(N
2
T)
1 1
1
( ) ( ) ( )
N
t t ij j t
j
i j a b o | |
+ +
=
=
=
=
N
i
i O p
1
1
) ( ) | ( | 1 ,..., 1 =T t
Problem 2: Decoding
Choose state sequence to maximise
probability of observation sequence
Viterbi algorithm - inductive algorithm that
keeps the best state sequence at each
instance
Central
problems
Problem 2: Decoding
State sequence to maximise P(O,Q|):
Define auxiliary variable :
) , | ,... , (
2 1
O q q q P
T
Viterbi algorithm:
Central
problems
) | ,... , , ,..., , ( max ) (
2 1 2 1
o
t t
q
t
o o o i q q q P i = =
t
(i) the probability of the most probable
path ending in state q
t
=i
Problem 2: Decoding
Recurrent property:
Algorithm:
1. Initialise:
) ( ) ) ( ( max ) (
1 1 + +
=
t j ij t
i
t
o b a i j o o
Central
problems
) ( ) (
1 1
o b i
i i
t o = N i s s 1
0 ) (
1
= i
To get state seq, need to keep track
of argument to maximise this, for each
t and j. Done via the array
t
(j).
Problem 2: Decoding
2. Recursion:
3. Terminate:
) ( ) ) ( ( max ) (
1
1
t j ij t
N i
t
o b a i j
s s
= o o
Central
problems
) ) ( ( max arg ) (
1
1
ij t
N i
t
a i j
s s
= o
N j T t s s s s 1 , 2
) ( max
1
i P
T
N i
o
s s
=
-
) ( max arg
1
i q
T
N i
T
o
s s
-
=
P* gives the state-optimised probability
Q* is the optimal state sequence
(Q* = {q1*,q2*,,qT*})
Problem 2: Decoding
4. Backtrack state sequence:
) (
1 1
-
+ +
-
=
t t t
q q
1 ,..., 2 , 1 + T T t
O(N
2
T) time complexity
Central
problems
Problem 3: Learning
Training HMM to encode obs seq such that HMM
should identify a similar obs seq in future
Find =(A,B,), maximising P(O|)
General algorithm:
Initialise:
0
Compute new model , using
0
and observed
sequence O
Then
Repeat steps 2 and 3 until:
o
Central
problems
d O P O P < ) | ( log ) | ( log
0
Problem 3: Learning
Let (i,j) be a probability of being in state i at time
t and at state j at time t+1, given and O seq
) | (
) ( ) ( ) (
) , (
1 1
| o
O P
j o b a i
j i
t t j ij t + +
=
Central
problems
= =
+ +
+ +
=
N
i
N
j
t t j ij t
t t j ij t
j o b a i
j o b a i
1 1
1 1
1 1
) ( ) ( ) (
) ( ) ( ) (
| o
| o
Step 1 of Baum-Welch algorithm:
Problem 3: Learning
Central
problems
Operations required for the computation
of the joint event that the system is in state
Si and time t and State Sj at time t+1
Problem 3: Learning
Let be a probability of being in state i at
time t, given O
- expected no. of transitions from state i
- expected no. of transitions
=
=
N
j
t t
j i i
1
) , ( ) (
Central
problems
1
1
( )
T
t
t
i
1
1
( )
T
t
t
i
j i
( )
t
i
Problem 3: Learning
the expected frequency of state i at time t=1
ratio of expected no. of transitions from
state i to j over expected no. of transitions from state i
ratio of expected no. of times in state j
observing symbol k over expected no. of times in state j
=
) (
) , (
i
j i
a
t
t
ij
Central
problems
Step 2 of Baum-Welch algorithm:
=
=
) (
) (
) (
,
j
j
k b
t
k o t
t
j
t
) (
1
i t =
Problem 3: Learning
Baum-Welch algorithm uses the forward and
backward algorithms to calculate the auxiliary
variables ,
B-W algorithm is a special case of the EM
algorithm:
E-step: calculation of and
M-step: iterative calculation of , ,
Practical issues:
Can get stuck in local maxima
Numerical problems log and scaling
t
ij
a
) (
k b
j
Central
problems
Further Reading
L. R. Rabiner, "A tutorial on Hidden Markov Models and
selected applications in speech recognition,"
Proceedings of the IEEE, vol. 77, pp. 257-286, 1989.
R. Dugad and U. B. Desai, "A tutorial on Hidden Markov
models," Signal Processing and Artifical Neural
Networks Laboratory, Dept of Electrical Engineering,
Indian Institute of Technology, Bombay Technical Report
No.: SPANN-96.1, 1996.
W.H. Laverty, M.J. Miket, and I.W. Kelly, Simulation of
Hidden Markov Models with EXCEL, The Statistician,
vol. 51, Part 1, pp. 31-40, 2002