0% found this document useful (0 votes)

179 views15 pages

Inside Outside Algorithm

This paper describes the inside outside algorithm. This algorithm computes over context free trees.

Uploaded by

madhu_gopinathan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views15 pages

Inside Outside Algorithm

This paper describes the inside outside algorithm. This algorithm computes over context free trees.

Uploaded by

madhu_gopinathan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

The Inside-Outside Algorithm

Michael Collins

Introduction

This note describes the inside-outside algorithm. The inside-outside algorithm has
very important applications to statistical models based on context-free grammars.
In particular, it is used in EM estimation of probabilistic context-free grammars,
and it is used in estimation of discriminative models for context-free parsing.
As we will see, the inside-outside algorithm has many similarities to the forwardbackward algorithm for hidden Markov models. It computes analogous quantities
to the forward and backward terms, for context-free trees.

Basic Definitions

This section gives some basic definitions. We first give definitions for contextfree grammars, and for a representation of parse trees. We then describe potential
functions over parse trees. The next section describes the quantities computed by
the inside-outside algorithm, and the algorithm itself.

2.1 Context-Free Grammars, and Parse Trees

The set-up is as follows. Assume that we have some input sentence x1 . . . xn ,
where n is the length of the sentence. Assume in addition we have a context-free
grammar (CFG) G = (N, , R, S) in Chomsky normal form, where:
N is a finite set of non-terminal symbols.
is a finite set of terminal symbols.
R is a finite set of rules. The grammar is in Chomsky normal form, so each
rule takes one of two forms: 1) A BC where A, B, C are all non-terminal
symbols; 2) A x where A is a non-terminal, and x is a terminal symbol.
S N is a distinguished start symbol.
1

The previous class note on PCFGs (posted on the webpage) has full details of
context-free grammars. For the input sentence x1 . . . xn , the CFG defines a set of
possible parse trees, which we will denote as T .
Any parse tree t T can be represented as a set of of rule productions. Each
rule production can take one of two forms:
hA B C, i, k, ji where A B C is a rule in the grammar, and i, k, j
are indices such that 1 i k < j n. A rule production of this form
specifies that the rule A B C is seen with non-terminal A spanning words
xi . . . xj in the input string; non-terminal B spanning words xi . . . xk in the
input string; and non-terminal C spanning words xk+1 . . . xj in the input
string.
hA, ii where A is a non-terminal, and i is an index with i {1, 2, . . . n}. A
rule production of this form specifies that the rule A xi is seen in a parse
tree, with A above the ith word in the input string.
As an example, consider the following parse tree:
S
NP
D

VP
V

the woman saw him

This tree contains the following rule productions:
S NP VP, 1, 2, 4
NP D N, 1, 1, 2
VP V P, 3, 3, 4
D, 1
N, 2
V, 3
P, 4

2.2 Potential Functions

We now define potential functions over parse trees. For any rule production r (of
the form hA B C, i, k, ji or hA, ii) we will use (r) to denote the potential for
that rule. The potential function (r) has the property that (r) 0 for all rule
productions r.
2

The potential for an entire tree t is defined as follows:

(t) =

(r)

hAB C,i,k,jit

(A B C, i, k, j)

hA,iit

(A, i)

Here we write r t if the parse tree t contains the rule production r. As an

example, consider the parse tree we gave earlier. The potential for that parse tree
would be
(S NP VP, 1, 2, 4) (NP D N, 1, 1, 2) (VP V P, 3, 3, 4)
(D, 1) (N, 2) (V, 3) (P, 4)
Hence to calculate the potential for a parse tree we simply read off the rule productions in the parse tree, and multiply the individual rule potentials.
Note that the potential for an entire parse tree satisfies (t) 0, because each
of the individual rule potentials (r) satisfies (r) 0.
In practice, the rule potentials might be defined in a number of ways. In one
setting, we might have a probabilistic CFG (PCFG), where each rule in the
grammar has an associated parameter q( ). This parameter can be interpreted
as the conditional probability of seeing the rule , given that the non-terminal
is being expanded. We would then define
(A B C, i, k, j) = q(A B C)
and
(A, i) = q(A xi )
Q

Under these definitions, for any tree t, the potential (t) = rt (r) is simply
the probability for that parse tree under the PCFG.
As a second example, consider a conditional random field (CRF) style model
for parsing with CFGs (see the lecture slides from earlier in the course). In this
case each rule production r has a feature vector (r) Rd , and in addition we
assume a parameter vector v Rd . We can then define the potential functions as
(r) = exp{v (r)}
The potential function for an entire tree is then
(t) =

(r) =

exp{v (r)} = exp{

v (r)}

Note that this is closely related to the distribution defined by a CRF-style model:
in particular, under the CRF we have for any tree t
p(t|x1 . . . xn ) = P

(t)
tT (t)

where T again denotes the set of all parse trees for x1 . . . xn under the CFG.

The Inside-Outside Algorithm

3.1 Quantities Computed by the Inside-Outside Algorithm

Given the definitions in the previous section, we now describe the quantities calculated by the inside-outside algorithm. The inputs to the algorithm are the following:
A sentence x1 . . . xn , where each xi is a word.
A CFG (N, , R, S) in Chomsky normal form.
A potential function (r) that maps any rule production r of the form hA
B C, i, k, ji or hA, ii to a value (r) 0.
As before, we define T to be the set of all possible parse trees for x1 . . . xn
Q
under the CFG, and we define (t) = rt (r) to be the potential function for
any tree.
Given these inputs, the inside-outside algorithm computes the following quantities:
1. Z =

(t).

2. For all rule productions r,

(r) =

(t)

tT :rt

3. For all non-terminals A N , for all indicies i, j such that 1 i j n,

(A, i, j) =

(t)

tT :(A,i,j)t

Here we write (A, i, j) t if the parse tree t contains the terminal A spanning words xi . . . xj in the input. For example, in the example parse tree
given before, the following (A, i, j) triples are seen in the tree: hS, 1, 4i;
hNP, 1, 2i; hVP, 3, 4i; hD, 1, 1i; hN, 2, 2i; hV, 3, 3i; hP, 4, 4i.
4

Note that there is a close correspondence between these terms, and the terms
computed by the forward-backward algorithm (see the previous notes).
In words, the quantity Z is the sum of potentials for all possible parse trees
for the input x1 . . . xn . The quantity (r) for any rule production r is the sum of
potentials for all parse trees that contain the rule production r. Finally, the quantity
(A, i, j) is the sum of potentials for all parse trees containing non-terminal A
spanning words xi . . . xj in the input.
We will soon see how these calculations can be applied within a particularly
context, namely EM-based estimation of the parameters of a PCFG. First, however,
we give the algorithm.

3.2 The Inside-Outside Algorithm

Figure 3.2 shows the inside-outside algorithm. The algorithm takes as its input a
sentence, a CFG, and a potential function (r) that maps any rule production r to
a value (r) 0. As output, it returns values for Z, (A, i, j) and (r), where r
can be any rule production.
The algorithm makes use of inside terms (A, i, j) and outside terms (A, i, j).
In the first stage of the algorithm, the (A, i, j) terms are calculated using a simple recursive definition. In the second stage, the (A, i, j) terms are calculated,
again using a relatively simple recursive definition. Note that the definition of the
(A, i, j) terms depends on the terms computed in the first stage of the algorithm.
The and terms are analogous to backward and forward terms in the forwardbackward algorithm. The next section gives a full explanation of the inside and
outside terms, together with the justification for the algorithm.

3.3 Justification for the Inside-Outside Algorithm

We now give justification for the algorithm. We first give a precise definition of the
quantities that the and terms correspond to, and describe how this leads to the
definitions of the Z and terms. We then show that the recursive definitions of the
and terms are correct.
3.3.1

Interpretation of the Terms

Again, take x1 . . . xn to be the input to the inside-outside algorithm. Before we

had defined T to be the set of all possible parse trees under the CFG for the input
sentence. In addition, define
T (A, i, j)

Inputs: A sentence x1 . . . xn , where each xi is a word. A CFG (N, , R, S) in Chomsky normal

form. A potential function (r) that maps any rule production r of the form hA B C, i, k, ji or
hA, ii to a value (r) 0.
Data structures:
(A, i, j) for any A N , for any (i, j) such that 1 i j n is the inside term for
(A, i, j).
(A, i, j) for any A N , for any (i, j) such that 1 i j n is the outside term for
(A, i, j).
Inside terms, base case:
For all i {1 . . . n}, for all A N , set (A, i, i) = (A, i) if the rule A xi is in the
CFG, set (A, i, i) = 0 otherwise.
Inside terms, recursive case:
For all A N , for all (i, j) such that 1 i < j n,
X

(A, i, j) =

j1
X

((A B C, i, k, j) (B, i, k) (C, k + 1, j))

AB CR k=i

Outside terms, base case:

Set (S, 1, n) = 1. Set (A, 1, n) = 0 for all A N such that A 6= S.
Outside terms, recursive case:
For all A N , for all (i, j) such that 1 i j n and (i, j) 6= (1, n),
(A, i, j)

i1
X

((B C A, k, i 1, j) (C, k, i 1) (B, k, j))

BC AR k=1

n
X

((B A C, i, j, k) (C, j + 1, k) (B, i, k))

BA CR k=j+1

Outputs: Return
Z =
(A, i, j) =
(A, i) =
(A B C, i, k, j) =

(S, 1, n)
(A, i, j) (A, i, j)
(A, i, i)
(A, i, j) (A B C, i, k, j) (B, i, k) (C, k + 1, j)

Figure 1: The inside-outside algorithm.

to be the set of all possible trees rooted in non-terminal A, and spanning words
xi . . . xj in the sentence. Note that under this definition, T = T (S, 1, n) (the full
set of parse trees for the input sentence is equal to the full set of trees rooted in the
symbol S, spanning words x1 . . . xn ).
As an example, for the input sentence the dog saw the man in the park, under
an appropriate CFG, one member of T (NP, 4, 8) would be
NP
PP

NP
D

the

man

NP
D

the park
The set T (NP, 4, 8) would be the set of all possible parse trees rooted in NP,
spanning words x4 . . . x8 = the man in the park.
Each t T (A, i, j) has an associated potential, defined in the same way as
before as
Y
(r)
(t) =
rt

We now claim the following: consider the (A, i, j) terms calculated in the
inside-outside algorithm. Then
X

(A, i, j) =

(t)

tT (A,i,j)

Thus the inside term (A, i, j) is simply the sum of potentials for all trees spanning
words xi . . . xj , rooted in the symbol A.
3.3.2

Interpretation of the Terms

Again, take x1 . . . xn to be the input to the inside-outside algorithm. Now, for any
non-terminal A, for any (i, j) such that 1 i j n, define
O(A, i, j)
to be the set of all outside trees with non-terminal A, and span xi . . . xj .
To illustrate the idea of an outside tree, again consider an example where the
input sentence is the dog saw the man in the park. Under an appropriate CFG, one
member of T (NP, 4, 5) would be

S
NP

the

dog

saw

NP
NP

PP
NP

IN
in

the park
This tree is rooted in the symbol S. The leafs of the tree form the sequence
x1 . . . x3 NP x6 . . . xn .
More generally, an outside tree for non-terminal A, with span xi . . . xj , is a tree
with the following properties:
The tree is rooted in the symbol S.
Each rule in the tree is a valid rule in the underlying CFG (e.g., S -> NP
VP, NP -> D N, D -> the, etc.)
The leaves of the tree form the sequence x1 . . . xi1 A xj+1 . . . xn .
Each outside tree t again has an associated potential, equal to
(t) =

(r)

We simply read off the rule productions in the outside tree, and take their product.
Again, recall that we defined O(A, i, j) to be the set of all possible outside
trees with non-terminal A and span xi . . . xj . We now make the following claim.
Consider the (A, i, j) terms calculated by the inside-outside algorithm. Then
X

(A, i, j) =

(t)

tO(A,i,j)

In words, the outside term for (A, i, j) is the sum of potentials for all outside trees
in the set O(A, i, j).

3.3.3

Putting the and Terms Together

We now give justification for the Z and terms calculated by the algorithm. First,
consider Z. Recall that we would like to compute
X

(t)

and that the algorithm has the definition

Z = (S, 1, n)
By definition, (S, 1, n) is the sum of potentials for all trees rooted in S, spanning words x1 . . . xn i.e., the sum of potentials for all parse trees of the input
sentenceso this definition is correct.
Next, recall that we would also like to compute
X

(A, i, j) =

(t)

tT :(A,i,j)t

and that the algorithm computes this as

(A, i, j) = (A, i, j) (A, i, j)
How is this latter expression justified?
First, note that any tree with non-terminal A spanning words xi . . . xj can be
decomposed into an outside tree in O(A, i, j) and an inside tree in T (A, i, j). For
example, consider the example used above, with the triple (NP, 4, 5). One parse
tree that contains an NP spanning words x4 . . . x5 is
S
NP

the

dog

V
saw

the

man

NP
D

the

park

This can be decomposed into the outside tree

S
NP

the

dog

saw

NP
NP

PP
IN

the

park

together with the inside tree

NP
N

the man
It follows that if we denote the outside tree by t1 , the inside tree by t2 , and the
full tree by t, we have
(t) = (t1 ) (t2 )
More generally, we have
X

(A, i, j) =

(t)

(1)

tT :(A,i,j)t

((t1 ) (t2 ))

(2)

t1 O(A,i,j) t2 T (A,i,j)

t1 O(A,i,j)

(t1 )

= (A, i, j) (A, i, j)

t2 T (A,i,j)

(t2 )

(3)
(4)

Eq. 1 follows by definition. Eq. 2 follows because any tree t with non-terminal A
spanning xi . . . xj can be decomposed into a pair (t1 , t2 ) where t1 O(A, i, j),
and t2 T (A, i, j). Eq. 3 follows by simple algebra. Finally, Eq. 4 follows by the
definitions of (A, i, j) and (A, i, j).
A similar argument can be used to justify computing
(r) =

tT :rt

(t)

as
(A, i) = (A, i, i)
(A B C, i, k, j) = (A, i, j) (A B C, i, k, j) (B, i, k) (C, k + 1, j)
For brevity the details are omitted.

The EM Algorithm for PCFGs

We now describe a very important application of the inside-outside algorithm: EM

estimation of PCFGs. The algorithm is given in figure 2.
The input to the algorithm is a set of training examples x(i) for i = 1 . . . n, and
(i)
(i)
a CFG. Each training example is a sentence x1 . . . xli , where li is the length of
(i)

the sentence, and each xj is a word. The output from the algorithm is a parameter
q(r) for each rule r in the CFG.
The algorithm starts with initial parameters q 0 (r) for each rule r (for example
these parameters could be chosen to be random values). As is usual in EM-based
algorithms, the algorithm defines a sequence of parameter settings q 1 , q 2 , . . . q T ,
where T is the number of iterations of the algorithm.
The parameters q t at the tth iteration are calculated as follows. In a first step,
the inside-outside algorithm is used to calculate expected counts f (r) for each rule
r in the PCFG, under the parameter values q t1 . Once the expected counts are
calculated, the new estimates are
q t (A ) = P

f (A )
AR f (A )

4.1 Calculation of the Expected Counts

The calculation of the expected counts f (r) for each rule r is the critical step: we
now describe this in more detail. First, some definitions are needed. We define Ti
to be the set of all possible parse trees for the sentence x(i) under the CFG. We
define
p(x, t; )
to be the probability of sentence x paired with parse tree t under the PCFG with
parameters (the parameter vector contains a parameter q(r) for each rule r in
the CFG). For any parse tree t, for any context-free rule r, we define count(t, r) to
be the number of times rule r is seen in the tree t. As is usual in PCFGs, we have
p(x, t; ) =

q(r)count(t,r)

Given a PCFG, and a sentence x, we can also calculate the conditional probablity
p(t|x; ) = P

p(x, t; )
tTi p(x, t; )

of any t Ti .
Given these definitions, we will show that the expected count f t1 (r) for any
rule r, as calculated in the tth iteration of the EM algorithm, is
f t1 (r) =

n X
X

p(t|x(i) ; t1 )count(t, r)

i=1 tTi

Thus we sum over all training examples (i = 1 . . . n), and for each training example, we sum over all parse trees t Ti for that training example. For each parse tree
t, we multiply the conditional probability p(t|x(i) ; t1 ) by the count count(t, r),
which is the number of times rule r is seen in the tree t.
Consider calculating the expected count of any rule on a single training example; that is, calculating
count(r) =

p(t|x(i) ; t1 )count(t, r)

(5)

tTi

Clearly, calculating this quantity by brute force (by explicitly enumerating all trees
t Ti ) is not tractable. However, the count(r) quantities can be calculated efficiently, using the inside-outside algorithm. Figure 3 shows the algorithm. The
algorithm takes as input a sentence x1 . . . xn , a CFG, and a parameter q t1 (r) for
each rule r in the grammar. In a first step the and Z terms are calculated using
the inside-outside algorithm. In a second step the counts are calculated based on
the and Z terms. For example, for any rule of the form A B C, we have
count(A B C) =

X (A B C, i, k, j)

i,k,j

(6)

where and Z are terms calculated by the inside-outside algorithm, and the sum
is over all i, k, j such that 1 i k < j n.
The equivalence between the definitions in Eqs. 5 and 6 can be justified as
follows. First, note that
count(t, A B C) =

[[hA B C, i, k, ji t]]

i,k,j

where [[hA B C, i, k, ji t]] is equal to 1 if the rule production hA

B C, i, k, ji is seen in the tree, 0 otherwise.
12

Hence
X

p(t|x(i) ; t1 )count(t, A B C)

p(t|x(i) ; t1 )

tTi

[[hA B C, i, k, ji t]]

i,k,j

tTi

p(t|x(i) ; t1 )[[hA B C, i, k, ji t]]

i,k,j tTi

X (A B C, i, k, j)

i,k,j

The final equality follows because if we define the potential functions in the insideoutside algorithm as
(A B C, i, k, j) = q t1 (A B C)
(A i) = q t1 (A xi )
then it can be verified that
X

p(t|x(i) ; t1 )[[hA B C, i, k, ji t]] =

tTi

(A B C, i, k, j)
Z

(i)

Inputs: Training examples x(i) for i = 1 . . . n, where each x(i) is a sentence with words xj for
j {1 . . . li } (li is the length of the ith sentence). A CFG (N, , R, S).
Initialization: Choose some initial PCFG parameters q 0 (r) for each r R. (e.g., initialize the
parameters to randomPvalues.) The initial parameters must satisfy the usual constraints that q(r) 0,
and for any A N , AR q(A ) = 1.
Algorithm:
For t = 1 . . . T
For all r, set f t1 (r) = 0
For i = 1 . . . n
Use the algorithm in figure 3 with inputs equal to the sentence x(i) , the CFG
(N, , R, S), and parameters q t1 , to calculate count(r) for each r R. Set
f t1 (r) = f t1 (r) + count(r)
for all r R.
Re-estimate the parameters as
f t1 (A )
t1 (A )
AR f

q t (A ) = P
for each rule A R.

Output: parameters q T (r) for all r R of the PCFG.

Figure 2: The EM algorithm for PCFGs.

Inputs: A sentence x1 . . . xn , where each xi is a word. A CFG (N, , R, S) in Chomsky normal

form. A parameter q(r) for each rule r R in the CFG.
Algorithm:
Run the inside-outside algorithm with inputs as follows: 1) the sentence x1 . . . xn ; 2) the
CFG (N, , R, S); 3) potential functions
(A B C, i, k, j) = q(A B C)
(A i) = q(A xi )
where q(A xi ) is defined to be 0 if the rule A xi is not in the grammar
For each rule of the form A B C,
count(A B C) =

X (A B C, i, k, j)
Z

i,k,j

where and Z are terms calculated by the inside-outside algorithm, and the sum is over all
i, k, j such that 1 i k < j n.
For each rule of the form A x,
count(A x) =

X (A, i)
Z
i:x =x
i

Outputs: a count count(r) for each rule r R.

Figure 3: Calculating expected counts using the inside-outside algorithm.

Lec06 Bottomupparser
83% (6)
Lec06 Bottomupparser
88 pages
Csf401 Unit 02
No ratings yet
Csf401 Unit 02
82 pages
18-Predictive Parsing
No ratings yet
18-Predictive Parsing
152 pages
Predictive Parsing and LL (1) - Compiler Design - Dr. D. P. Sharma - NITK Surathkal by Wahid311
100% (2)
Predictive Parsing and LL (1) - Compiler Design - Dr. D. P. Sharma - NITK Surathkal by Wahid311
56 pages
Lecture 17
No ratings yet
Lecture 17
57 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Flat Unit-3
No ratings yet
Flat Unit-3
74 pages
Ch-6 CFG, Derivation Trees
No ratings yet
Ch-6 CFG, Derivation Trees
23 pages
FLAT - Ch. 3
No ratings yet
FLAT - Ch. 3
69 pages
Week 10 - Non Recursive Predictive Parsor
0% (1)
Week 10 - Non Recursive Predictive Parsor
41 pages
Unit 4 ContextFreeLanguage
No ratings yet
Unit 4 ContextFreeLanguage
58 pages
Tda 6107 Ajf
No ratings yet
Tda 6107 Ajf
16 pages
Syntax Analyzer
No ratings yet
Syntax Analyzer
38 pages
4 Predctive Parser
No ratings yet
4 Predctive Parser
59 pages
Recursive Analysis
From Everand
Recursive Analysis
R. L. Goodstein
No ratings yet
Context Free Grammars 2
No ratings yet
Context Free Grammars 2
52 pages
LASSO Book Tibshirani PDF
No ratings yet
LASSO Book Tibshirani PDF
362 pages
CCNA Lab 1
No ratings yet
CCNA Lab 1
19 pages
Parsing Bun
No ratings yet
Parsing Bun
48 pages
4 Parsing
No ratings yet
4 Parsing
55 pages
Act CH 3
No ratings yet
Act CH 3
36 pages
Flat CH 3
No ratings yet
Flat CH 3
74 pages
Chapter3 CFG
No ratings yet
Chapter3 CFG
67 pages
Unit 7
No ratings yet
Unit 7
34 pages
Parsing Part - 1
No ratings yet
Parsing Part - 1
53 pages
Codigo de Barras EP2000
No ratings yet
Codigo de Barras EP2000
48 pages
CS242 - Module 5
No ratings yet
CS242 - Module 5
42 pages
5 - Lecture05 - Top-Down Parsing
No ratings yet
5 - Lecture05 - Top-Down Parsing
35 pages
Lecture 6
No ratings yet
Lecture 6
50 pages
Constituency Parsing PPT 2
No ratings yet
Constituency Parsing PPT 2
33 pages
Unit 3 Syntax - Analyzer
No ratings yet
Unit 3 Syntax - Analyzer
56 pages
Automata Lectuee5
No ratings yet
Automata Lectuee5
33 pages
Pda Annotated 10 12 2021
No ratings yet
Pda Annotated 10 12 2021
37 pages
Robust Vocabulary Instruction
No ratings yet
Robust Vocabulary Instruction
4 pages
WINSEM2024-25 BCSE304L TH VL2024250501632 2025-02-15 Reference-Material-I
No ratings yet
WINSEM2024-25 BCSE304L TH VL2024250501632 2025-02-15 Reference-Material-I
29 pages
6 Probabilisticparse
No ratings yet
6 Probabilisticparse
46 pages
ICT Trivia
No ratings yet
ICT Trivia
9 pages
Flat 2
No ratings yet
Flat 2
15 pages
Context-Free Languages & Grammars (Cfls & CFGS) : 1 10/10/2022 C.P.Shabariram Ap (Sr. GR.) /cse
No ratings yet
Context-Free Languages & Grammars (Cfls & CFGS) : 1 10/10/2022 C.P.Shabariram Ap (Sr. GR.) /cse
36 pages
q8, q9, q10 Question and Answers
No ratings yet
q8, q9, q10 Question and Answers
16 pages
Chapter 9 V 2
No ratings yet
Chapter 9 V 2
18 pages
What's The Point of Slowing Down?
No ratings yet
What's The Point of Slowing Down?
32 pages
NLP Unit 3
No ratings yet
NLP Unit 3
17 pages
More Trees
No ratings yet
More Trees
17 pages
Backus-Naur Form - Computer Science
No ratings yet
Backus-Naur Form - Computer Science
18 pages
Compilers Lecture 5
No ratings yet
Compilers Lecture 5
30 pages
Lecture 07
No ratings yet
Lecture 07
35 pages
Lecture 10
No ratings yet
Lecture 10
24 pages
Lari Young-Insideoutsidealgorithm
No ratings yet
Lari Young-Insideoutsidealgorithm
22 pages
Context Free Grammars
No ratings yet
Context Free Grammars
40 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
CSC 4181 Compiler Construction Parsing
No ratings yet
CSC 4181 Compiler Construction Parsing
53 pages
Context-Free Languages & Grammars (Cfls & CFGS) : Reading: Chapter 5
No ratings yet
Context-Free Languages & Grammars (Cfls & CFGS) : Reading: Chapter 5
38 pages
CPSC 388 - Compiler Design and Construction: Parsers - Context Free Grammars
No ratings yet
CPSC 388 - Compiler Design and Construction: Parsers - Context Free Grammars
35 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
NLP Module 3
No ratings yet
NLP Module 3
11 pages
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
No ratings yet
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
44 pages
CS 373: Theory of Computation: Manoj Prabhakaran Mahesh Viswanathan Fall 2008
No ratings yet
CS 373: Theory of Computation: Manoj Prabhakaran Mahesh Viswanathan Fall 2008
64 pages
Pumping Lemma (Bar Hillel Lemma)
No ratings yet
Pumping Lemma (Bar Hillel Lemma)
49 pages
Semiring Parsing
No ratings yet
Semiring Parsing
34 pages
Inside-Outside and Forward-Backward Algorithms Are Just Backprop (Tutorial Paper)
No ratings yet
Inside-Outside and Forward-Backward Algorithms Are Just Backprop (Tutorial Paper)
17 pages
Manual of Diamond LCD Chess Game Time
100% (1)
Manual of Diamond LCD Chess Game Time
1 page
Formats
No ratings yet
Formats
14 pages
Formal Languages, Automata and Computability
No ratings yet
Formal Languages, Automata and Computability
29 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Basic Parsing Techniques - Parsing
No ratings yet
Basic Parsing Techniques - Parsing
20 pages
Parser Final
No ratings yet
Parser Final
19 pages
Top-Down Parsing: Programming Language Application
No ratings yet
Top-Down Parsing: Programming Language Application
4 pages
Unit II PDF
No ratings yet
Unit II PDF
7 pages
Homework 5 Solutions
No ratings yet
Homework 5 Solutions
6 pages
Week 3 - Probablistic Context Free Grammars
No ratings yet
Week 3 - Probablistic Context Free Grammars
18 pages
Status of Education
No ratings yet
Status of Education
296 pages
DB6CONV 640 v47
No ratings yet
DB6CONV 640 v47
50 pages
HP F210 User Manual
No ratings yet
HP F210 User Manual
31 pages
Activity File XII 24-25 - 240919 - 091153
No ratings yet
Activity File XII 24-25 - 240919 - 091153
17 pages
CETECOM Antenna Testing Pocket Guide
No ratings yet
CETECOM Antenna Testing Pocket Guide
2 pages
2FA Set Up
No ratings yet
2FA Set Up
17 pages
Computers
No ratings yet
Computers
2 pages
TD-Section IV-Technical Specification-ELL Crane-ICG 2020
No ratings yet
TD-Section IV-Technical Specification-ELL Crane-ICG 2020
76 pages
Product Detail - 700d - English - 3
No ratings yet
Product Detail - 700d - English - 3
2 pages
Sample Course - Answer-Booklet
No ratings yet
Sample Course - Answer-Booklet
20 pages
Geocoding Best Practices
No ratings yet
Geocoding Best Practices
287 pages
F
No ratings yet
F
124 pages
How To Participate in A Zoom Meeting
No ratings yet
How To Participate in A Zoom Meeting
6 pages
Microsoft Word - Social Media Page Activity
No ratings yet
Microsoft Word - Social Media Page Activity
8 pages
Understanding Single-Ended, Pseudo-Differential and Fully-Differential ADC Inputs
No ratings yet
Understanding Single-Ended, Pseudo-Differential and Fully-Differential ADC Inputs
8 pages
Programming Fundamentals PDF
No ratings yet
Programming Fundamentals PDF
56 pages
C# Concepts
No ratings yet
C# Concepts
2 pages
F20 HMGT 6335 OPRE 6332 Spreadsheet Modeling SYLLABUS
No ratings yet
F20 HMGT 6335 OPRE 6332 Spreadsheet Modeling SYLLABUS
9 pages
HTML Media
No ratings yet
HTML Media
6 pages
Teaching Veterinary Radiology and Diagnostic Ultrasound at A Distance: Using A QTVR Image Database
No ratings yet
Teaching Veterinary Radiology and Diagnostic Ultrasound at A Distance: Using A QTVR Image Database
19 pages
CC0002 Notes
No ratings yet
CC0002 Notes
10 pages
Revisiting The Ss Central America Search
No ratings yet
Revisiting The Ss Central America Search
8 pages
Ніч яка місячна Sheet music for Piano, Vocals (Piano-Voice)
No ratings yet
Ніч яка місячна Sheet music for Piano, Vocals (Piano-Voice)
1 page
Communicating Luxury
No ratings yet
Communicating Luxury
2 pages
American Educator 2014
No ratings yet
American Educator 2014
6 pages
What Is Wrong With Inequality
No ratings yet
What Is Wrong With Inequality
1 page
CG PO and Co Mapping
No ratings yet
CG PO and Co Mapping
2 pages
Box Sensor 2
No ratings yet
Box Sensor 2
1 page

Inside Outside Algorithm

Uploaded by

Inside Outside Algorithm

Uploaded by

The Inside-Outside Algorithm

2.1 Context-Free Grammars, and Parse Trees

the woman saw him

2.2 Potential Functions

The potential for an entire tree t is defined as follows:

Here we write r t if the parse tree t contains the rule production r. As an

exp{v (r)} = exp{

The Inside-Outside Algorithm

3.1 Quantities Computed by the Inside-Outside Algorithm

2. For all rule productions r,

3. For all non-terminals A N , for all indicies i, j such that 1 i j n,

3.2 The Inside-Outside Algorithm

3.3 Justification for the Inside-Outside Algorithm

Interpretation of the Terms

Again, take x1 . . . xn to be the input to the inside-outside algorithm. Before we

Inputs: A sentence x1 . . . xn , where each xi is a word. A CFG (N, , R, S) in Chomsky normal

((A B C, i, k, j) (B, i, k) (C, k + 1, j))

Outside terms, base case:

((B C A, k, i 1, j) (C, k, i 1) (B, k, j))

((B A C, i, j, k) (C, j + 1, k) (B, i, k))

Figure 1: The inside-outside algorithm.

Interpretation of the Terms

Putting the and Terms Together

and that the algorithm has the definition

and that the algorithm computes this as

This can be decomposed into the outside tree

together with the inside tree

The EM Algorithm for PCFGs

We now describe a very important application of the inside-outside algorithm: EM

4.1 Calculation of the Expected Counts

where [[hA B C, i, k, ji t]] is equal to 1 if the rule production hA

p(t|x(i) ; t1 )[[hA B C, i, k, ji t]]

p(t|x(i) ; t1 )[[hA B C, i, k, ji t]] =

Output: parameters q T (r) for all r R of the PCFG.

Figure 2: The EM algorithm for PCFGs.

Inputs: A sentence x1 . . . xn , where each xi is a word. A CFG (N, , R, S) in Chomsky normal

Outputs: a count count(r) for each rule r R.

Figure 3: Calculating expected counts using the inside-outside algorithm.

You might also like