0% found this document useful (0 votes)
75 views41 pages

Fsa and HMM: LING 572 Fei Xia 1/5/06

The document discusses finite state automata (FSA), hidden Markov models (HMM), and the relationship between the two. It defines FSA, probabilistic FSA, weighted FSA, and how operations can be performed on FSA. It then defines state-emission and arc-emission HMM, the parameters and constraints of HMM, and how to calculate probabilities and find the best state sequence in HMM using the forward and Viterbi algorithms. Finally, it discusses how an HMM can be converted into an equivalent weighted FSA.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views41 pages

Fsa and HMM: LING 572 Fei Xia 1/5/06

The document discusses finite state automata (FSA), hidden Markov models (HMM), and the relationship between the two. It defines FSA, probabilistic FSA, weighted FSA, and how operations can be performed on FSA. It then defines state-emission and arc-emission HMM, the parameters and constraints of HMM, and how to calculate probabilities and find the best state sequence in HMM using the forward and Viterbi algorithms. Finally, it discusses how an HMM can be converted into an equivalent weighted FSA.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 41

FSA and HMM

LING 572
Fei Xia
1/5/06
Outline
FSA

HMM

Relation between FSA and HMM


FSA
Definition of FSA
A FSA is (Q, , I , F , )
Q: a finite set of states
: a finite set of input symbols
I: the set of initial states
F: the set of final states
Q ( { }) Q : the transition relation
between states.
An example of FSA

a b
q0 q1
Definition of FST
A FST is (Q, , , I , F , )
Q: a finite set of states
: a finite set of input symbols
: a finite set of output symbols
I: the set of initial states
F: the set of final states
Q ( { }) ( { }) Q : the transition
relation between states.

FSA can be seen as a special case of FST


The extended transition relation * is the smallest set
such that
*
(q, x, y, r ) * (r , a, b, s ) (q, xa, yb, s ) *

T transduces a string x into a string y if there exists a path


from the initial state to a final state whose input is x and
whose output is y:

x[T ] y iff q I f F s.t. (q, x, y, f ) *


An example of FST

a:x b:y
q0 q1
Operations on FSTs
Union: x[T S ] y iff x[T ] y or x[ S ] y

Concatenation:
wx[T S ] yz iff w[T ] y and x[ S ]z

Composition:
x[T S ]z iff y s.t. x[T ] y and y[ S ]z
An example of composition
operation
a:x b:y
q0 q1

x:
y:z
q0
Probabilistic finite-state automata
(PFA)
Informally, in a PFA, each arc is associated with a probability.

The probability of a path is the multiplication of the arcs on the path.

The probability of a string x is the sum of the probabilities of all the


paths for x.

Tasks:
Given a string x, find the best path for x.
Given a string x, find the probability of x in a PFA.
Find the string with the highest probability in a PFA

Formal definition of PFA
A PFA is (Q, , I , F , , P)
Q: a finite set of N states
: a finite set of input symbols
I: Q R+ (initial-state probabilities)
F: Q R+ (final-state probabilities)
Q ( { }) Q : the transition relation
between states.
P: R
(transition probabilities)
Constraints on function:
I (q) 1
qQ

q Q F (q ) P (q, a, q ' ) 1
a
q 'Q

Probability of a string:
n
P ( w1,n , q1,n 1 ) I (q1 ) * F (q n 1 ) * p (qi , wi , qi 1 )
i 1

P ( w1,n ) P( w
q1, n 1
1, n , q1,n 1 )
Consistency of a PFA
Let A be a PFA.
Def: P(x | A) = the sum of all the valid paths for x in A.
Def: a valid path in A is a path for some string x with
probability greater than 0.
Def: A is called consistent if x
P ( x | A) 1

Def: a state of a PFA is useful if it appears in at least one


valid path.

Proposition: a PFA is consistent if all its states are useful.


Q1 of Hw1
An example of PFA

b:0.8
a:1
q0:0 q1:0.2 I(q0)=1.0
I(q1)=0.0

P(abn)=0.2*0.8n

0.8 0
x P( x)
n0
P(ab ) 0.2 * 0.8 0.2 *
n

n 0
n

1 0.8
1
Weighted finite-state automata
(WFA)
Each arc is associated with a weight.
Sum and Multiplication can be other
meanings.

weight ( x) ( I ( s ) P( s, x, t ) F (t ))
s ,tQ
HMM
Two types of HMMs
State-emission HMM (Moore machine):
The emission probability depends only on the
state (from-state or to-state).

Arc-emission HMM (Mealy machine):


The probability depends on (from-state, to-state)
pair.
State-emission HMM

s1 s2 sN

w1 w4 w1 w3 w5 w1

Two kinds of parameters:


Transition probability: P(sj | si)
Output (Emission) probability: P(wk | si)
# of Parameters: O(NM+N2)
Arc-emission HMM
w1
w1 w2
w5 w1
sN
s1
w4
s2
w3
Same kinds of parameters but the
emission probabilities depend on both
states: P(wk, sj | si)

# of Parameters: O(N2M+N2).
Are the two types of HMMs
equivalent?
For each state-emission HMM1, there is an
arc-emission HMM2, such that for any
sequence O, P(O|HMM1)=P(O|HMM2).

The reverse is also true.

Q3 and Q4 of hw1.
Definition of arc-emission HMM
A HMM is a tuple ( S , , , A, B ) :
A set of states S={s1, s2, , sN}.
A set of output symbols ={w1, , wM}.
Initial state probabilities { i }
State transition prob: A={aij}.
Symbol emission prob: B={bijk}

State sequence: X1,n


Output sequence: O1,n
n
P (O1,n , X 1,n 1 ) ( x1 ) * P ( xi 1 | xi ) * P (oi | xi , xi 1 )
i 1

P (O1,n ) P(O
X 1 , n 1
1, n , X 1,n 1 )
Constraints

N
a b 1

ij ijk
i 1 k j
i 1
N

a
j 1
ij 1

M For any integer n and any HMM


bijk 1
k 1 P(O | HMM ) 1
|O| n

Q2 of hw1.
Properties of HMM
Limited horizon: P( X t 1 | X 1 , X 2 ,... X t ) P( X t 1 | X t )

Time invariance: the probabilities do not change


over time: P ( X t 1 | X t ) P ( X t 1 m | X t m )

The states are hidden because we know the


structure of the machine (i.e., S and ), but we
dont know which state sequences generate a
particular output.
Applications of HMM
N-gram POS tagging
Bigram tagger: oi is a word, and si is a POS tag.
Trigram tagger: oi is a word, and si is ??

Other tagging problems:


Word segmentation
Chunking
NE tagging
Punctuation predication

Other applications: ASR, .


Three fundamental questions for
HMMs
1. Finding the probability of an observation

2. Finding the best state sequence

3. Training: estimating parameters


(1) Finding the probability of
the observation
Forward probability: the probability of producing
O1,t-1 while ending up in state si:
def
i (t ) P (O1,t 1 , X t i )

N
P(O) i (T 1)
i 1
Calculating forward probability
Initialization: i (1) i
Induction:
j (t 1) P(O1,t , X t 1 j )

P(O1,t , X t i, X t 1 j )
i

P(O1,t 1 , X t i) * P(ot , X t 1 j | O1,t 1 , X t i )


i

P(O1,t 1 , X t i) * P(ot , X t 1 j | X t i )
i

i (t )aij bijot
i
(2) Finding the best state sequence
Given the observation O1,T=o1oT, find the
state sequence X1,T+1=X1 XT+1 that
maximizes P(X1,T+1 | O1,T).

o1 o2 oT
XT XT+1
X1 X2

Viterbi algorithm
Viterbi algorithm
The probability of the best path that produces O 1,t-1
while ending up in state si:
def
i (t ) max P ( X 1,t 1 , O1,t 1 , X t i )
X 1, t 1

Initialization: i (1) i
Induction: j (t 1) max i (t )aij bijot
i

Modify it to allow epsilon emission: Q5 of hw1.


Summary of HMM
Two types of HMMs: state-emission and arc-
emission HMM: (S , K , , A, B)
Properties: Markov assumption
Applications: POS-tagging, etc.
Finding the probability of an observation: forward
probability
Decoding: Viterbi decoding
Relation between FSA and
HMM
Relation between WFA and HMM
HMM can be seen as a special type of
WFA.

Given an HMM, how to build an equivalent


WFA?
Converting HMM into WFA
Given an HMM ( S , 1 , , A, B) , build a WFA
(Q, 2 , I , F , , P) such that. for any input sequence O,
P(O|HMM)=P(O|WFA).

Build a WFA: add a final state and arcs to it


Show that there is a one-to-one mapping between the
paths in HMM and the paths in WFA
Prove that the probabilities in HMM and in WFA are
identical.
HMM WFA
Need to create a new state (the final state) and add edges to it.

Q S { f }
2 1 { }
q S I (q ) (q ), I ( f ) 0
q S F (q ) 0 F ( f ) 1
{(qi , wk , q j ) | qi , q j S (bijk * aij 0)} {(qi , , f ) | qi S }
P (qi , wk , q j ) aij * bijk
P(qi , , f ) 1

The WFA is not a PFA.


A slightly different definition of HMM

A HMM is a tuple ( S , , , A, B, q f ) :
A set of states S={s1, s2, , sN}.
A set of output symbols ={w1, , wM}.
Initial state probabilities { i }
State transition prob: A={aij}.
Symbol emission prob: B={bijk}
qf is the final state: there are no outcoming
edges from qf
N Constraints

i 1
i 1
N For any HMM (under
i q f a
j 1
ij 1 this new definition)
M
i q f bijk 1
k 1 P(O | HMM ) 1
O

j a q f , j 0
jk bq f , j ,k 0
HMM PFA
HMM ( S , 1 , , A, B, q f )
PFA (Q, 2 , I , F , , P )

QS
2 1
q S I (q ) (q)
F (q f ) 1 and q S {q f } F (q ) 0
{( qi , wk , q j ) | qi , q j S (bijk * aij 0)}
P (qi , wk , q j ) aij * bijk
PFA HMM
PFA (Q, 1 , I , F , , P) HMM ( S , 2 , , A, B, q f )
S Q {q f }
2 1 { }

i Q [i ] I [i ]
[q f ] 0
Need to add a new
final state and edges to it
i Q aij P(qi , wk , q j )
k

a i ,q f F [i ]

i Q bijk P(qi , wk , q j ) / aij


bi ,q f , 1
Project: Part 1
Learn to use Carmel (a WFST package)

Use Carmel as an HMM Viterbi decoder


for a trigram POS tagger.

The instruction will be handed out on 1/12,


and the project is due on 1/19.
Summary
FSA

HMM

Relation between FSA and HMM


HMM (the common def) is a special case of
WFA
HMM (a different def) is equivalent to PFA.

You might also like