0% found this document useful (0 votes)

75 views41 pages

Fsa and HMM: LING 572 Fei Xia 1/5/06

The document discusses finite state automata (FSA), hidden Markov models (HMM), and the relationship between the two. It defines FSA, probabilistic FSA, weighted FSA, and how operations can be performed on FSA. It then defines state-emission and arc-emission HMM, the parameters and constraints of HMM, and how to calculate probabilities and find the best state sequence in HMM using the forward and Viterbi algorithms. Finally, it discusses how an HMM can be converted into an equivalent weighted FSA.

Uploaded by

Nur Indah Pratiwi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views41 pages

Fsa and HMM: LING 572 Fei Xia 1/5/06

Uploaded by

Nur Indah Pratiwi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 41

FSA and HMM

LING 572
Fei Xia
1/5/06
Outline
FSA

HMM

Relation between FSA and HMM

FSA
Definition of FSA
A FSA is (Q, , I , F , )
Q: a finite set of states
: a finite set of input symbols
I: the set of initial states
F: the set of final states
Q ( { }) Q : the transition relation
between states.
An example of FSA

a b
q0 q1
Definition of FST
A FST is (Q, , , I , F , )
Q: a finite set of states
: a finite set of input symbols
: a finite set of output symbols
I: the set of initial states
F: the set of final states
Q ( { }) ( { }) Q : the transition
relation between states.

FSA can be seen as a special case of FST

The extended transition relation * is the smallest set
such that
*
(q, x, y, r ) * (r , a, b, s ) (q, xa, yb, s ) *

T transduces a string x into a string y if there exists a path

from the initial state to a final state whose input is x and
whose output is y:

x[T ] y iff q I f F s.t. (q, x, y, f ) *

An example of FST

a:x b:y
q0 q1
Operations on FSTs
Union: x[T S ] y iff x[T ] y or x[ S ] y

Concatenation:
wx[T S ] yz iff w[T ] y and x[ S ]z

Composition:
x[T S ]z iff y s.t. x[T ] y and y[ S ]z
An example of composition
operation
a:x b:y
q0 q1

x:
y:z
q0
Probabilistic finite-state automata
(PFA)
Informally, in a PFA, each arc is associated with a probability.

The probability of a path is the multiplication of the arcs on the path.

The probability of a string x is the sum of the probabilities of all the

paths for x.

Tasks:
Given a string x, find the best path for x.
Given a string x, find the probability of x in a PFA.
Find the string with the highest probability in a PFA

Formal definition of PFA
A PFA is (Q, , I , F , , P)
Q: a finite set of N states
: a finite set of input symbols
I: Q R+ (initial-state probabilities)
F: Q R+ (final-state probabilities)
Q ( { }) Q : the transition relation
between states.
P: R
(transition probabilities)
Constraints on function:
I (q) 1
qQ

q Q F (q ) P (q, a, q ' ) 1
a
q 'Q

Probability of a string:
n
P ( w1,n , q1,n 1 ) I (q1 ) * F (q n 1 ) * p (qi , wi , qi 1 )
i 1

P ( w1,n ) P( w
q1, n 1
1, n , q1,n 1 )
Consistency of a PFA
Let A be a PFA.
Def: P(x | A) = the sum of all the valid paths for x in A.
Def: a valid path in A is a path for some string x with
probability greater than 0.
Def: A is called consistent if x
P ( x | A) 1

Def: a state of a PFA is useful if it appears in at least one

valid path.

Proposition: a PFA is consistent if all its states are useful.

Q1 of Hw1
An example of PFA

b:0.8
a:1
q0:0 q1:0.2 I(q0)=1.0
I(q1)=0.0

P(abn)=0.2*0.8n

0.8 0
x P( x)
n0
P(ab ) 0.2 * 0.8 0.2 *
n

n 0
n

1 0.8
1
Weighted finite-state automata
(WFA)
Each arc is associated with a weight.
Sum and Multiplication can be other
meanings.

weight ( x) ( I ( s ) P( s, x, t ) F (t ))
s ,tQ
HMM
Two types of HMMs
State-emission HMM (Moore machine):
The emission probability depends only on the
state (from-state or to-state).

Arc-emission HMM (Mealy machine):

The probability depends on (from-state, to-state)
pair.
State-emission HMM

s1 s2 sN

w1 w4 w1 w3 w5 w1

Two kinds of parameters:

Transition probability: P(sj | si)
Output (Emission) probability: P(wk | si)
# of Parameters: O(NM+N2)
Arc-emission HMM
w1
w1 w2
w5 w1
sN
s1
w4
s2
w3
Same kinds of parameters but the
emission probabilities depend on both
states: P(wk, sj | si)

# of Parameters: O(N2M+N2).
Are the two types of HMMs
equivalent?
For each state-emission HMM1, there is an
arc-emission HMM2, such that for any
sequence O, P(O|HMM1)=P(O|HMM2).

The reverse is also true.

Q3 and Q4 of hw1.
Definition of arc-emission HMM
A HMM is a tuple ( S , , , A, B ) :
A set of states S={s1, s2, , sN}.
A set of output symbols ={w1, , wM}.
Initial state probabilities { i }
State transition prob: A={aij}.
Symbol emission prob: B={bijk}

State sequence: X1,n

Output sequence: O1,n
n
P (O1,n , X 1,n 1 ) ( x1 ) * P ( xi 1 | xi ) * P (oi | xi , xi 1 )
i 1

P (O1,n ) P(O
X 1 , n 1
1, n , X 1,n 1 )
Constraints

N
a b 1

ij ijk
i 1 k j
i 1
N

a
j 1
ij 1

M For any integer n and any HMM

bijk 1
k 1 P(O | HMM ) 1
|O| n

Q2 of hw1.
Properties of HMM
Limited horizon: P( X t 1 | X 1 , X 2 ,... X t ) P( X t 1 | X t )

Time invariance: the probabilities do not change

over time: P ( X t 1 | X t ) P ( X t 1 m | X t m )

The states are hidden because we know the

structure of the machine (i.e., S and ), but we
dont know which state sequences generate a
particular output.
Applications of HMM
N-gram POS tagging
Bigram tagger: oi is a word, and si is a POS tag.
Trigram tagger: oi is a word, and si is ??

Other applications: ASR, .

Three fundamental questions for
HMMs
1. Finding the probability of an observation

2. Finding the best state sequence

3. Training: estimating parameters

(1) Finding the probability of
the observation
Forward probability: the probability of producing
O1,t-1 while ending up in state si:
def
i (t ) P (O1,t 1 , X t i )

N
P(O) i (T 1)
i 1
Calculating forward probability
Initialization: i (1) i
Induction:
j (t 1) P(O1,t , X t 1 j )

P(O1,t , X t i, X t 1 j )
i

P(O1,t 1 , X t i) * P(ot , X t 1 j | O1,t 1 , X t i )

P(O1,t 1 , X t i) * P(ot , X t 1 j | X t i )
i

i (t )aij bijot
i
(2) Finding the best state sequence
Given the observation O1,T=o1oT, find the
state sequence X1,T+1=X1 XT+1 that
maximizes P(X1,T+1 | O1,T).

o1 o2 oT
XT XT+1
X1 X2

Viterbi algorithm
Viterbi algorithm
The probability of the best path that produces O 1,t-1
while ending up in state si:
def
i (t ) max P ( X 1,t 1 , O1,t 1 , X t i )
X 1, t 1

Initialization: i (1) i
Induction: j (t 1) max i (t )aij bijot
i

Modify it to allow epsilon emission: Q5 of hw1.

Summary of HMM
Two types of HMMs: state-emission and arc-
emission HMM: (S , K , , A, B)
Properties: Markov assumption
Applications: POS-tagging, etc.
Finding the probability of an observation: forward
probability
Decoding: Viterbi decoding
Relation between FSA and
HMM
Relation between WFA and HMM
HMM can be seen as a special type of
WFA.

Given an HMM, how to build an equivalent

WFA?
Converting HMM into WFA
Given an HMM ( S , 1 , , A, B) , build a WFA
(Q, 2 , I , F , , P) such that. for any input sequence O,
P(O|HMM)=P(O|WFA).

Build a WFA: add a final state and arcs to it

Show that there is a one-to-one mapping between the
paths in HMM and the paths in WFA
Prove that the probabilities in HMM and in WFA are
identical.
HMM WFA
Need to create a new state (the final state) and add edges to it.

Q S { f }
2 1 { }
q S I (q ) (q ), I ( f ) 0
q S F (q ) 0 F ( f ) 1
{(qi , wk , q j ) | qi , q j S (bijk * aij 0)} {(qi , , f ) | qi S }
P (qi , wk , q j ) aij * bijk
P(qi , , f ) 1

The WFA is not a PFA.

A slightly different definition of HMM

A HMM is a tuple ( S , , , A, B, q f ) :
A set of states S={s1, s2, , sN}.
A set of output symbols ={w1, , wM}.
Initial state probabilities { i }
State transition prob: A={aij}.
Symbol emission prob: B={bijk}
qf is the final state: there are no outcoming
edges from qf
N Constraints

i 1
i 1
N For any HMM (under
i q f a
j 1
ij 1 this new definition)
M
i q f bijk 1
k 1 P(O | HMM ) 1
O

j a q f , j 0
jk bq f , j ,k 0
HMM PFA
HMM ( S , 1 , , A, B, q f )
PFA (Q, 2 , I , F , , P )

QS
2 1
q S I (q ) (q)
F (q f ) 1 and q S {q f } F (q ) 0
{( qi , wk , q j ) | qi , q j S (bijk * aij 0)}
P (qi , wk , q j ) aij * bijk
PFA HMM
PFA (Q, 1 , I , F , , P) HMM ( S , 2 , , A, B, q f )
S Q {q f }
2 1 { }

i Q [i ] I [i ]
[q f ] 0
Need to add a new
final state and edges to it
i Q aij P(qi , wk , q j )
k

a i ,q f F [i ]

i Q bijk P(qi , wk , q j ) / aij

bi ,q f , 1
Project: Part 1
Learn to use Carmel (a WFST package)

Use Carmel as an HMM Viterbi decoder

for a trigram POS tagger.

The instruction will be handed out on 1/12,

and the project is due on 1/19.
Summary
FSA

HMM

Relation between FSA and HMM

HMM (the common def) is a special case of
WFA
HMM (a different def) is equivalent to PFA.

Black, Matthew_ Goodman, Martin_ Millar, Fergus_ Schürer, Emil_ Vermès, Géza_ Vermes, Pamela-The History of the Jewish People in the Age of Jesus Christ_ Volume 1-Bloomsbury Academic_Bloomsbury T & T
100% (2)
Black, Matthew_ Goodman, Martin_ Millar, Fergus_ Schürer, Emil_ Vermès, Géza_ Vermes, Pamela-The History of the Jewish People in the Age of Jesus Christ_ Volume 1-Bloomsbury Academic_Bloomsbury T & T
641 pages
WPPSI-III Parents Guide
100% (2)
WPPSI-III Parents Guide
5 pages
3 Fa
No ratings yet
3 Fa
64 pages
Hidden Markov Models: Julia Hirschberg CS4705
No ratings yet
Hidden Markov Models: Julia Hirschberg CS4705
37 pages
Automataetl They Are Very Helpful
No ratings yet
Automataetl They Are Very Helpful
54 pages
Section 12 Continuous Time Markov Chains
No ratings yet
Section 12 Continuous Time Markov Chains
16 pages
Hidden Markov Models: CH 3.2, 3.2 of DEKM
No ratings yet
Hidden Markov Models: CH 3.2, 3.2 of DEKM
27 pages
Lec 11
No ratings yet
Lec 11
7 pages
Chapter 2: Finite Automata
No ratings yet
Chapter 2: Finite Automata
8 pages
Lecture Notes Week1
No ratings yet
Lecture Notes Week1
19 pages
HMM
No ratings yet
HMM
25 pages
Finite Automata
No ratings yet
Finite Automata
30 pages
11 Probabilistic Temporal Models
No ratings yet
11 Probabilistic Temporal Models
60 pages
Practice 8
No ratings yet
Practice 8
8 pages
Lecture 5
No ratings yet
Lecture 5
43 pages
Chapter 4 - Discrete Time Markov Chains
No ratings yet
Chapter 4 - Discrete Time Markov Chains
37 pages
HMM Isolated Word Recognition
No ratings yet
HMM Isolated Word Recognition
23 pages
10-FA To Regular Expression-18!01!2023
No ratings yet
10-FA To Regular Expression-18!01!2023
100 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
ESI 4313 Operations Research 2: Markov Chains Basics
No ratings yet
ESI 4313 Operations Research 2: Markov Chains Basics
45 pages
Finite State Automata: Kashiram Pokharel Hcoe
No ratings yet
Finite State Automata: Kashiram Pokharel Hcoe
61 pages
Semi - Probabilistic Automata
No ratings yet
Semi - Probabilistic Automata
10 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
20 pages
Lec7 - 10 - HMM Learning
No ratings yet
Lec7 - 10 - HMM Learning
88 pages
PAC-Learning of Markov Models With Hidden State
No ratings yet
PAC-Learning of Markov Models With Hidden State
12 pages
Finite Automata Examples
No ratings yet
Finite Automata Examples
68 pages
L4 Tagging
No ratings yet
L4 Tagging
107 pages
cd −θz − (y−z)
No ratings yet
cd −θz − (y−z)
3 pages
Automata Unit1
No ratings yet
Automata Unit1
14 pages
Automata Theory
No ratings yet
Automata Theory
53 pages
斯坦福大学机器学习数学基础 65-72
No ratings yet
斯坦福大学机器学习数学基础 65-72
8 pages
Flat Unit 1
No ratings yet
Flat Unit 1
36 pages
11 Automata Theory Part1
No ratings yet
11 Automata Theory Part1
23 pages
(MTL106) Review Notes - Stochastic Processes (IITD)
No ratings yet
(MTL106) Review Notes - Stochastic Processes (IITD)
8 pages
FLAT - Ch-2
No ratings yet
FLAT - Ch-2
50 pages
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
No ratings yet
Using MALLET For Conditional Random Fields: Matthew Michelson & Craig A. Knoblock CSCI 548 - Lecture 3
41 pages
Lec12 PDF
No ratings yet
Lec12 PDF
22 pages
Hidden Markovnikov Model
No ratings yet
Hidden Markovnikov Model
32 pages
Hmms and The Forward-Backward Algorithm: Hidden Markov Models
No ratings yet
Hmms and The Forward-Backward Algorithm: Hidden Markov Models
7 pages
Book Icgi2012
No ratings yet
Book Icgi2012
29 pages
02 Finite Automata
No ratings yet
02 Finite Automata
56 pages
CH 2 - Finite Automata
No ratings yet
CH 2 - Finite Automata
72 pages
Chapter 2 Finite State Automata Part 1
No ratings yet
Chapter 2 Finite State Automata Part 1
97 pages
PoSTagging-HMM
No ratings yet
PoSTagging-HMM
24 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
31 pages
FLAT Unit 1 Notes (Typed)
No ratings yet
FLAT Unit 1 Notes (Typed)
18 pages
Discrete Time Markov
No ratings yet
Discrete Time Markov
71 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
32 pages
Written By: Prof A. M .Padmareddy Chapter 1: Introduction To Finite Automata
No ratings yet
Written By: Prof A. M .Padmareddy Chapter 1: Introduction To Finite Automata
13 pages
Discrete-Time Markov Chains: ELEC345 1
No ratings yet
Discrete-Time Markov Chains: ELEC345 1
33 pages
Theory of Computation: Sathyabama
No ratings yet
Theory of Computation: Sathyabama
92 pages
STAT333 Lecture Notes Book Version
No ratings yet
STAT333 Lecture Notes Book Version
71 pages
Hidden Markov Models and Sequential Data
No ratings yet
Hidden Markov Models and Sequential Data
45 pages
Sequence Model:: Hidden Markov Models
No ratings yet
Sequence Model:: Hidden Markov Models
60 pages
Chapter 2 - DFA
No ratings yet
Chapter 2 - DFA
13 pages
Section 2.3
No ratings yet
Section 2.3
39 pages
Chapter1 FLAT Module 1
No ratings yet
Chapter1 FLAT Module 1
28 pages
Lecture07 HMM S
No ratings yet
Lecture07 HMM S
26 pages
cs229 HMM
No ratings yet
cs229 HMM
13 pages
Unit I-Session 11and 12
No ratings yet
Unit I-Session 11and 12
23 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
CSTP 3 Maki 5
No ratings yet
CSTP 3 Maki 5
7 pages
Key To Eng 203 Clauses 2020-21
No ratings yet
Key To Eng 203 Clauses 2020-21
9 pages
Марунчак М. Студії до історії українців Канади. Т. 4 PDF
No ratings yet
Марунчак М. Студії до історії українців Канади. Т. 4 PDF
312 pages
The Genitive Case - Genetiivi - Finnish Grammar - Uusi Kielemme
No ratings yet
The Genitive Case - Genetiivi - Finnish Grammar - Uusi Kielemme
13 pages
LC Mastery Level Analysis New MATH Q 2
No ratings yet
LC Mastery Level Analysis New MATH Q 2
17 pages
Rupesh Ranjan Ratha Resumes PDF
No ratings yet
Rupesh Ranjan Ratha Resumes PDF
2 pages
Satan Is Not The Devil
No ratings yet
Satan Is Not The Devil
14 pages
Lab Activity Java Basic
No ratings yet
Lab Activity Java Basic
4 pages
MCQ
100% (1)
MCQ
3 pages
Ariadna Carolina Alvarez Lopez Daniel Lopez Luis Enrique Quiñones
No ratings yet
Ariadna Carolina Alvarez Lopez Daniel Lopez Luis Enrique Quiñones
1 page
The Future Tenses Practice
100% (1)
The Future Tenses Practice
2 pages
Adverb Clauses
No ratings yet
Adverb Clauses
6 pages
BS en 15000-2008
No ratings yet
BS en 15000-2008
18 pages
Tourism Video Collab
No ratings yet
Tourism Video Collab
2 pages
ANH - 1 ĐỀ 2 DAP AN
No ratings yet
ANH - 1 ĐỀ 2 DAP AN
9 pages
Dual Language Teachers Beliefs and Practices Regarding Effective
No ratings yet
Dual Language Teachers Beliefs and Practices Regarding Effective
284 pages
Europass CV 111209 Indzhova
No ratings yet
Europass CV 111209 Indzhova
3 pages
Math 8 Week 1 Real Number System - Identification of The Classification of Numbers Discussion
No ratings yet
Math 8 Week 1 Real Number System - Identification of The Classification of Numbers Discussion
32 pages
Glossary of ICT Terminology
100% (1)
Glossary of ICT Terminology
69 pages
AP Statistics A - 09 Evaluate Graded Assignment
No ratings yet
AP Statistics A - 09 Evaluate Graded Assignment
2 pages
Tabel Tenses
100% (1)
Tabel Tenses
3 pages
Modals Chart 3º Bil
No ratings yet
Modals Chart 3º Bil
7 pages
Aristotle Rhetoric Notes
No ratings yet
Aristotle Rhetoric Notes
18 pages
Lesson Plan The Little Blue Boy
No ratings yet
Lesson Plan The Little Blue Boy
3 pages
Professional CV Resume Angie
No ratings yet
Professional CV Resume Angie
2 pages
Hypertext
No ratings yet
Hypertext
16 pages
Week 1 Daily Warm-Ups
No ratings yet
Week 1 Daily Warm-Ups
2 pages
Concurrent Window System - Rob Pike
No ratings yet
Concurrent Window System - Rob Pike
11 pages

Fsa and HMM: LING 572 Fei Xia 1/5/06

Uploaded by

Fsa and HMM: LING 572 Fei Xia 1/5/06

Uploaded by

FSA and HMM

Relation between FSA and HMM

FSA can be seen as a special case of FST

T transduces a string x into a string y if there exists a path

x[T ] y iff q I f F s.t. (q, x, y, f ) *

The probability of a path is the multiplication of the arcs on the path.

The probability of a string x is the sum of the probabilities of all the

Def: a state of a PFA is useful if it appears in at least one

Proposition: a PFA is consistent if all its states are useful.

Arc-emission HMM (Mealy machine):

Two kinds of parameters:

The reverse is also true.

State sequence: X1,n

M For any integer n and any HMM

Time invariance: the probabilities do not change

The states are hidden because we know the

Other tagging problems:

Other applications: ASR, .

2. Finding the best state sequence

3. Training: estimating parameters

P(O1,t 1 , X t i) * P(ot , X t 1 j | O1,t 1 , X t i )

Modify it to allow epsilon emission: Q5 of hw1.

Given an HMM, how to build an equivalent

Build a WFA: add a final state and arcs to it

The WFA is not a PFA.

i Q bijk P(qi , wk , q j ) / aij

Use Carmel as an HMM Viterbi decoder

The instruction will be handed out on 1/12,

Relation between FSA and HMM

You might also like