Extended Viterbi Algorithm For Second Order Hidden Markov Processes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

EXTENDED VITERBI ALGORITHM FOR SECOND ORDER HIDDEN MARKOV PROCESS

Yang H e
Department of Electrical a n d Computer Engineering State University of New York at Buffalo Buffalo, NY 14260, U.S.A.*

Abstract In this paper, a n extended Viterbi algorithm is presented. T h e algorithm gives a maximum a posteriori estimation of the second order hidden Markov process. T h e advantage of t h e second order model and the complexity of the extended algorithm are compared with those of the original first order one. The method used t o develop the extended algorithm can also be used to extend the Viterbi algorithm further t o any higher order.

any higher order HMM can be easily obtained by analogy of the method used in this paper. All examples used are chosen from script recognition.

I. I n t r o d u c t i o n
Hidden Markov model (HMM) has been successfully used by many researchers in speech recognition a n d handwritten script recognition [1]-[3]. A solution to this model is the Viterbi algorithm. I t gives, through an observation sequence observed in memoryless noise, an optimal estimation of the state sequence in the sense of maximum a posterzori probability [4], [ 5 ] . But in all previous research in those areas, the underlying Markov process is restricted to the first order one. So is the Viterbi algorithm applied to. If we can draw a higher order HMhl from practical problems, we will be able, obviously, t o incorporate more information into recognition procedure, which is specially meaningful to knowledge based systems. But t o get to the solution, the Viterbi algorithm must be extended to be applied t o higher order HMM. In the following sections, we will extend Viterbi algorithm to the second order hidden Markov process and compare its complexity with t h a t of the first order one. At the paper, it will be seen that a n algorithm for end of this -~ -

1 1 . Order Reduction Let X = (zl, z2,. . . , ZK) represent a n N state, K time-long second order Markov process, where z t , 1 5 i 5 K , can b e any one of the N states. Let Z = (21,z2,. . . , Z K ) represent a n observation sequence of the process, where z , , 1 5 i 5 K , can be any one of the M observation symbols. We assume t h a t the observation is memoryless. In another word, for any i, otservation ti depends only on the time i state in the process sequence. Let following notations represent, the probabilities we will use: P(X): Probability of state sequence X ; P ( X , Z ) : Joint probability of state sequence X a n d observation sequence Z ; P(XIZ): Probability t h a t state sequence is X , given observation sequence Z ; P(ZIX):Probability that 2 is observed when state sequence is X ; p ( z 1 ) : Initial state probability; p ( z z l z 1 ) : Probability of one step transition from time 1 to time 2; p ( q \ z a ) : Probability t h a t zi is observed at time i, given the time i state zi; p ( z i I q - 1 , ~ ~ - 2 ) Two : step transition probability. Our aim is to find a particular state sequence X', when observation sequence Z is given, so that P(X'IZ) is maximum, or equivalently P ( X * > Z ) = P ( X - \ Z ) P ( Z is ) maximum. From above definition a n d assumption, it is easy to derive t h a t P ( X , Z)= P(X)P(ZlX)
= P(zl)P(zl /zl)P(ZZI~1)P(z21~2)
IC

* Permanent address: Department of Electronic Engineering, Sheiizhen University, Shenzhen, Guangdong, P. R. China.
CH2614-6/88/0000/0718$01.000 1988 IEEE
718

(1)

x ~ P ( Izt-~,z~-2)~(zaIzt) G
a13

In order t o find the X with maximum P ( X , Z ) , we introduce a combined state sequence Y = ( y 1 , y z 1 . . . , y K ) , where YI = 21 and Y, = z 1 - 1 z z l f o ~ 2 5 i 5 K . For example, for word seems, X = (~1~z2,z3,~4,z = 5 )( s , e , e , m ,s) and Y = (yl, yz, y3, y4, y5) = ( s , se, eel e m , 771s). With this definition, we have P(Y1) = P(Z1) P(YZIY1) = P(Z/Zl)
zz-z), P(Y11Y1-1) = P(z1Iz2-1, 3

111. R e c u r s i v e T r a c i n g
Let us consider a partial state sequence X , = ( z 1 , z 2 , . . . , 2,) and a partial observation sequence 2, = (zl, 22,. . . , z l ) . For 3 5 i 5 K , we have
(% 3 )) P(X,, 2,) = P(X2-1, ~ % - l ) P ( Y % l Y l - l ) P ( ~ Z l ~

5 5K

For example, P(Y3 = eelyz = se) = P ( t h e third letter is e , given the second letter e and the first letter s.)

= p(z3 = elzz = e , z l = s)
S u b s t i t u h g these probabilities into Equation ( l ) ,we have
K

P(X1.q = P(Yl)P(zl/zl)~ P ( Y i l Y Z - l ) P ( ~ i / ~ i(2) )


i=2

Equation ( 3 ) suggests that a t any time i 2 3 , t o find the maximum P ( X , , Z,) for each y,, we need only to: (a) remember the maximum P(X,-1, for each yZ-1, ( b ) extend each of these probabilities t o every yz by computing Equation ( 3 ) , and (c) select the maximum P ( X z , 2 , ) for each y,. By increasing i by 1 until i = K and repeating (a) through (c), the maximum P ( X , 2 ) for each Z K can finally be found. Then, among all of the P ( X , 2 ) for every Y K , we choose the maximum one and backtrace the sequence leading t o this maximum probability. The result is the optimal combined state sequence Y. If the first original state in each combined state is omitted, the result is the optimal sequence X. At the initial step when z 5 2 , we compute P(X1,Zl) = P(Yl)P(zllsl) (4) and

Equation ( 2 ) is in the same form as that of the first order model. We can also draw a trellis to represent combined state transition, which is very similar t o one step transition trellis.

ee
em

es
me

mm
ms
se

From the definition of combined state, we know there are N x Ar combined state. But since from any Y , - ~ , the sequence can only transit t o such a y, that its first original state is identical t o the second one in ylP1, there are only N x N x N non-zero values for p(y, Iy,-l), corresponding t o two step transition probability p ( z , / z , - 1 , ~ ~ - 2 ) . In the following algorithm, combined states are not numbered from 1 to N x N. Instead, two indexes, each from 1 t o AT, are used t o denote combined states.

sm
ss
I

seems

Figure 1. An example of combined state sequence An example of 5 time-long sequence with 3 original state e, m and s is shown in Figure 1. Only transitions from time 1 to time 3 are drawn except the dark line, which represents the sequence of word seems. A complete trellis includes paths of all possible state sequences.
7 I9

IV. The A l g o r i t h m In order to describe the algorithm more clearly, we use following notations, which are more similar t o the variables used in program: =I); UO(+P(Yl al(l,m)-P(Yz = W Y l = 1 ) ; a2(L m, n)-P(Y, = mnIy2-1 = lm); b(z,In)-P(~11G = n ) ; d1(l)--P(X1 = l , & ) ; d,(m,n)-maximum P ( X , , Z , ) for y, = 7nn, 2 5 i 5 N; c,(m,n)-state of x l P 2 that maximizes P(X,, 2%) for y, = m n ; z(z)-time z state in the optimal sequence X, the final resu!t.

I n addition, we use symbol


l<m<M

max [ezpression]

to denote a function whose value is the value of m t h a t maximizes the value of t h e expression. S T E P 1. Initialization a. For 1 5 I < AT,

b. For 1 5 1 N , forI<m<N,

<

This step computes P ( X 1 ,Zl) for each maximum P ( X 2 , & ) for each yz = l m . S T E P 2. Recursive Computation For3<i<K, for l < m < N , forl<n<N,

11

= 1 and

Although experimental results are pending availability of properly calculated two step transition probability in each application field*, the advantage t h a t second order model contains more information can be seen through observing some particular examples. In script recognition, for instance, if the first order model is used, then letter U can be followed by all letters except h, j , k, q a n d t~ through z [6]. B u t when a second order model is used, U can only be followed by a , e , i, o or y , if the U follows a q, the first letter in t h a t word. Thus, the use of two step transition probability can eliminate much more possibilities of erroneous recognition. From above discussion, we can see that it is easy t o extend t h e algorithm further to any higher order model by following the same method. For example, if we combined each 3 consecutive original state t o form combined state, we can easily derive a n algorithm for the third order model.

References

L. R. Rabiner, S.E. Levinson and M. M. Sondhi,


On the Application of Vector Quantization and Hidden Markov Models to Speaker Independent Isolated Word Recognition, Bell System Technzcal Journal, vol. 62, pp. 1075-1105, Apr., 1983. A. Kundu a n d P. Bahl, Recognition of Handprinted Script: a Hidden Markov Model Based Approach, Proc. of ICASSP, New York City, pp. 928-932, Apr., 1988. R. Nag, K. H. Wong a n d F. Fallside, Script Recognition Using Hidden Markov Models, Proc. O f ICASSP, Vol. 3, pp. 2071-2074, 1986. A. J . Viterbi, Error Bounds for Convolutional Codes and a n Asymptotically Optimum Decoding Algorithm, IEEE B a n s . on Informatzon Theory, Vol. 13, pp.260-269, Apr., 1967. G. D. Forney, The Viterbi Algorithm, Proc. of IEEE, Vol. 6 6 , pp. 268-278, No. 3, March, 1973. A. G. Konheim, Cryptography: a Primer, Chapter 2, John Wiely and Sons, New York. 1982. A. Kundu, Y. He and P. Bahl, Recognition of Handwritten Word: First and Second Order Hidden Markov Model Based Approach, Proc. of IEEE Conference on Computer Visaon and Pattern Recognztion, Ann Arbor, Michigan, June, 1988.

c , ( m , n ) = argl m a x [ d , - l ( l , m ) a z ( l , m , n ) ]
1<lLN

P ( X , Z ) for Y K = mn, 1 stored in d ~ ( m , n ) .

At t h e end of this step, maximum probability 5 m , n N , is found and

<

S T E P 3. Determination of the Last T w o States

z ( K - 1) = arg,

l<m<N lLn<N

m a x [dK(m,n ) ]

S T E P 4. Backtracing to the First State ForK-22i21, x ( i ) = citz(z(i

+ I), x ( i + 2))

Now the optimal sequence is i n z(i).

V . Conclusion
In comparison with the Viterbi algorithm for the first order HMM [l],[2], we can see t h a t the computation required for above algorithm is approximately N times as much as for the first order model. If N is not too large (for example, in script recognition, N = 26), this will not cause substantial difficulties.
720

* By the time this paper was accepted, an experiment in handwritten word recognition had been done and shown the advantage of the second order model over the first order one [7].

You might also like