0% found this document useful (0 votes)

6 views17 pages

Notes 3

Uploaded by

Ollintzin Rosas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views17 pages

Notes 3

Uploaded by

Ollintzin Rosas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Massachusetts Institute of Technology

6.863J/9.611J, Natural Language Processing, Spring, 2001

Department of Electrical Engineering and Computer Science
Department of Brain and Cognitive Sciences

Handout 7: Computation & Hierarchical parsing I; Earley’s algorithm

1 Representations for dominance/precedence structure

We know that natural languages contain units larger than single words. For example, the
sentence,

I know that this President enjoys the exercise of military power.

plainly consists of two smaller sentences: I know something and what it is that I know, namely,
that this President enjoys the exercise of military power . This second word group functions
as a single unit in several senses. First of all, it intuitively stands for what it is that I know,
and so is a meaningful unit. Second, one can move it around as one chunk—as in, That this
President enjoys the exercise of military power is something I know . This group of words is
a syntactic unit. Finally, echoing a common theme, there are other word sequences—not just
single words—that can be substituted for it, while retaining grammaticality: I know that the
guy with his ﬁnger on the button is the President. To put it bluntly, there’s no way for us
to say that the guy with his ﬁnger on the button and the President can both play the same
syntactic roles. As we have seen, this leads to an unwelcome network duplication. It does not
allow us to say that sentences are built out of hierarchical parts.
We’ve already seen that there are several reasons to use hierarchical descriptions for natural
languages. Let’s summarize these here.

• Larger units than single words. Natural languages have word sequences that act as if
they were single units.

• Obvious hierarchy and recursion. Natural language sentences themselves may contain
embedded sentences, and these sentences in turn may contain sentences.

• Nonadjacent constraints or grammatical relations. Natural languages exhibit constraints

over nonadjacent words, as in Subject–Verb agreement.

• Succinctness. Natural languages are built by combining a few types of phrases in diﬀer-
ent combinations, like Noun Phrases and Verb Phrases. Important: the phrase names
themselves, even their existence, is in a sense purely taxonomic, just as word classes are.
(Phrases don’t exist except as our theoretical apparatus requires them to.)

• Compositional meaning. Natural languages have sentences whose meaning intuitively

follows the hierarchical structure of phrases.
2 6.863J Handout 7, Spring, 2001

A phrase or nonterminal is a collection of words that behaves alike (= acts identically

under some set of operations, like movement). Example:

(1) (i) John kissed the baby.

(ii) The baby was kissed by John.
(iii) John kissed the baby and the politician.
(iv) The baby and the politician were kissed by John.

A phrase category (nonterminal), by analogy with a word category, is determined by

identity under substitution contexts. For instance, what is called a noun phrase is simply an
equivalence class of some string of tokens that can be substituted for one another anywhere.
Grammars defined by such equivalence classes are therefore called context-free.
Another way to look at the same situation is to consider what minimal augmentation we
need to make to the pure precedence structure of finite transition networks or, equivalently,
right- or left-linear grammars in order to accommodate phrases. The only information in
a precedence structure is in the binary predicate precedes. What we need to add is a new
binary predicate dominates. Intuitively, it gives us the “vertical” arrangement of phrases while
precedence gives us the “horizontal” arrangement. Given a sentence it is traditional to say
that all phrases are related either by precedence or dominance (but not both); this yields a
tree structure. It is important to remember that this structure itself is derivative; it is the
relations that are of central importance and what are recovered during parsing.
Fact: Hierarchical structure (containing dominance and precedence predicates) is not associa-
tive. It therefore can represent at least two types of ambiguity: lexical (or category ambiguity),
which it inherits from precedence structure; and structural or hierarchical ambiguity. Exam-
ple: the dark blue sky, or undoable.
Fact: (Joshi and Levy, 1977). If there are n different phrase categories, then the collection
of substitutable equivalence classes of phrases is completely determined by all possible tree
structures of depth less than or equal to 2n − 1. (Compare this result to that for pure
precedence structures, where the substitution classes for a grammar with n word categories is
fixed by all linear sequences of length ≤ 2n − 1.)
Just as with precedence structure, there are four basic ways of representing precedence
and hierarchical structure: as recursive transition networks (Conway, 1963, Design of a
separable transition diagram compiler , first used for COBOL!); as context-free grammars;
as pushdown automata; and (graphically) as singly-rooted, acyclic directed graphs, or trees.
The figure illustrates how the “abstract operations” in each representation encodes the
binary relations of precedence and dominance (which are necessary and sufficient to get us
the precedence and dominance relations that we want). Scan gives us linear precedence (as
before); push (predict) and pop (complete) give us the dominance relation.
Computation & Hierarchical parsing I; Earley’s algorithm 3

NETWORK TREE
NP VP NP

Predict, Predict, Complete

Push Complete Push or Pop
Article Noun or Pop
Article Noun
Scan Scan
Scan

GRAMMAR AUTOMATON
S NP VP
NP NP NP
Predict Complete
S Art S Art S N S
NP Article Noun NP
Push Scan Scan Pop
Scan Scan

Note how this method introduces a way to say where each phrase begins, ends, and attaches.
How can we do this using our FTN network descriptions? We already have defined one kind
of phrase using our “flat” networks—namely, a Sentence phrase, represented by the whole
network. Why not just extend this to all phrases and represent each by a network of its own.
It’s clear enough what the networks for Noun Phrases and Verb Phrases should be in this case—
just our usual finite-transition diagrams will do. What about the network for Sentences? It
no longer consists of adjacent words, but of adjacent phrases: namely, a Noun Phrase followed
by a Verb Phrase.
We have now answered the begin and end questions, but not the attach question. We must
say how the subnetworks are linked together, and for this we’ll introduce two new arc types to
encode the dominance relation. We now do this by extending our categorization relation from
words to groups of words. To get from state S-0 to state S-1 of the main Sentence network, we
must determine that there is a Noun Phrase at that point in the input.
To check whether there is a Noun Phrase in the input, we must refer to the subnetwork
labeled Noun Phrase. We can encode this reference to the Noun Phrase network in several
ways. One way is just to add jump arcs from the Start state of the Sentence to the first state
of the Noun Phrase subnetwork. This is also very much like a subroutine call: the subnetwork
is the name of a procedure that we can invoke. In this case, we need to know the starting
address of the procedure (the subnetwork) so that we can go and find it. Whatever the means,
we have defined the beginning of a phrase. We now move through the Noun Phrase subnet,
checking that all is-a relations for single words are satisfied. Reaching the final state for that
network, we now have defined the end of a Noun Phrase. Now what? We should not just
stop, but return to the main network that we came from—to the state on the other end of
the Noun Phrase arc, since we have now seen that there is a Noun Phrase at this position in
the sentence. In short, each subnetwork checks the is-a relation for each distinct phrase, while
the main sentence network checks the is-a relation for the sentence as a whole. (Note that in
4 6.863J Handout 7, Spring, 2001

Sentence:

NP VP
S-0 S-1 S-2
ε
ε

ε verb
ε
Noun VP-0 VP-1 VP-2
phrase Verb phrase
subnet subnet
ε
determiner noun
NP-0 NP-1 NP-3

Figure 1: Using jump arcs for network calls. In general, a stack must be used to keep track of
(arbitrary) return addresses.

general we must keep track of the proper return addresses, and we can use a stack for that.)
This arrangement also answers the attachment question: the subunit Noun Phrase fits into
the larger picture of things depending on how we get back to a main network from a subnetwork.
Similarly, we now have to refer to the Verb Phrase subnetwork to check the is-a relation for a
Verb Phrase, and then, returning from there, come back to state S-2, and finish. Note that the
beginning and end of the main network are defined as before for simple finite-state transition
systems, by the start state and end state of the network itself.
The revised network now answers the key questions of hierarchical analysis:

• A phrase of a certain type begins by referring to its network description, either as the
initial state of the main network (S-0 in our example), or by a call from one network to
another. The name of a phrase comes from the name of the subnetwork.

• A phrase ends when we reach the ﬁnal state of the network describing it. This means
that we have completed the construction of a phrase of a particular type.

• A phrase is attached to the phrase that is the name of the network that referred to
(called) it.

To look at these answers from a slightly different perspective, note that each basic network is
itself a finite-state automaton that gives the basic linear order of elements in a phrase. The
same linear order of states and arcs imposes a linear order on phrases. This establishes a
precedes relation between every pair of elements (words or phrases). Hierarchical domination
is fixed by the pattern of subnetwork jumps and returns. But because the sentence patterns
described by network and subnetwork traversal must lead all the way from the start state to
Computation & Hierarchical parsing I; Earley’s algorithm 5

or VP
NP VP
sold
John
bought NP

a new car

Figure 2: A multidimensional syntactic structure.

a ﬁnal state without interruption, we can see that such a network must relate every pair of
elements (=arc labels) by either dominates or precedes. You can check informally that this
seems to be so. If a phrase, like a Noun Phrase, does not dominate another phrase, like a Verb
Phrase, then either the Noun Phrase precedes the Verb Phrase or the Verb Phrase precedes
the Noun Phrase.
In summary, the hierarchical network system we have described can model languages where
every pair of phrases or words satisﬁes either the dominates or precedes relation. It is interest-
ing to ask whether this is a necessary property of natural languages. Could we have a natural
language where two words or phrases were not related by either dominates or precedes? This
seems hard to imagine because words and phrases are spoken linearly, and so seem to auto-
matically satisfy the precedes relation if they don’t satisfy dominates. However, remember
that we are interested not just in the external representation of word strings, but also in their
internal (mental and computer) representation. There is nothing that bars us or a computer
from storing the representation of a linear string of words in a non-linear way.
In fact, there are constructions in English and other languages that suggest just this pos-
sibility. If we look at the following sentence,

John bought or sold a new car

we note that sold a new car forms a Verb Phrase—as we would expect since sell subcategorizes
for a Noun Phrase. But what about bought? It too demands a Noun Phrase Object. By the
restrictions on subcategorization described in the last chapter, we know that this Noun Phrase
must appear in the same Verb Phrase that bought forms. That is, the Verb Phrase bought
. . . must dominate a new car . This is a problem, because the only way we can do this using
our hierarchical networks is to have bought dominate the Verb Phrase sold a new car as well.
Suppose, though, that we relax the condition that all elements of a sentence must be in either a
dominates or precedes relation. Then we could have the Verb Phrases bought and sold bearing
no dominance or precedence relation to each other. This would be compatible with the picture
in ﬁgure ??. Here we have two Verb Phrases dominating a new car . Neither dominates or
precedes the other. Note how the subcategorization constraint is met by both phrases.
6 6.863J Handout 7, Spring, 2001

This is an internal representation of a sentence. When it comes time to speak it must of

course be converted into a physically realizable external form. Then, and only then, are the
two verbs projected onto a single line, the speech stream.
It’s obvious that such examples would cause trouble with the network model we’ve described
so far, because it forces exhaustive precedes and dominates relations. Most network models
resort to some additional machinery, external to the basic operation of the network itself, in
order to accommodate such sentences.
More generally, this problem may be attributed to an implicit bias in a linear or hierarchical
language description. By saying that a sentence is just a sequence of words or phrases, we are
in eﬀect saying that the important relationships among elements in a sentence are limited to
those that can be stated in terms of what can be placed next to what. These are properties
based on string concatenation, and the network model directly encodes this.

2 How many kinds of phrases are there?

In fact, the possible kinds of phrases in a given natural language are much more restricted than
the preceding suggests. For instance, a noun is never the “base” of a verb phrase, or a verb
the base of a noun phrase. Rather, every phrase seems to be a “projection” of some bottom
lexical category from which it inherits its basic categorial properties. (Traditionally, this was
dubbed the “endocentric” character of phrases.)
The base of every phrase of type X is a word of type X: a noun phrase is based on a
noun; an adjective phrase (as in green with envy), is based on an adjective, and so forth. Thus
the basic phrase types in a language are formed from the major word classes. The phrases
themselves are derivative. This idea is called X theory. (Historical note: these observations
were perhaps ﬁrst noted by Harris (1951), Methods in Structural Linguistics; restudied by
Lyons (1968), and then Chomsky (1970).) The idea is based on the noting the following kind
of generalization (parentheses indicate optionality; * indeﬁnite repetition):

Verb Phrase (VP)→Verb (NP)∗ (PP)∗ (S)

Prepositional Phrase (PP)→Preposition (NP)∗ (PP)∗ (S)
Adjective Phrase(AP ) →Adjective (NP)∗ (PP)∗ (S)
Noun Phrase (NP)→Determiner Noun (PP)∗ (S)

Generalizing: X Phrase (XP)→Verb (NP)∗ (PP)∗ (S)

for X=Noun, Verb, Adjective, etc.
In other words, hierarchical grammars for natural languages are much more restricted than
one would think at first glance. They might all have the same skeleton templates within a
language, so-called X-bar theory. From this point of view, in fact, the hierarchical rules are
derivative, “projected” from the lexicon. (More on this below.) Indeed, we shall adopt an even
more restrictive view that the structure is essentially binary branching.
For a more formal account, we first give the definition of a context-free grammar.
Definition 1: A context-free grammar , G, is a 4-tuple, (N, T, P, Start), where N is a finite,
nonempty set of phrase symbols or nonterminals; T is a finite, nonempty set of words or
terminals; P is a finite set of productions of the form X → α, where X is a single nonterminal
and α is a string of terminals and nonterminals; and Start is a designated start symbol.
Computation & Hierarchical parsing I; Earley’s algorithm 7

To deﬁne the notion of a language, we deﬁne a derivation relation as with FTNs: two
strings of nonterminals and terminals α, β are related via a derivation step, ⇒, according to
grammar G if there is a rule of G, X → ϕ such that α = ψXγ and β = ψϕγ. That is, we can
obtain the string β from α by replacing X by ϕ. The string of nonterminals and terminals
∗
produced at each step of a derivation from S is called a sentential form. ⇒ is the application
of zero or more derivation steps. The language generated by a context-free grammar , L(G), is
the set of all the terminal strings that can be derived from S using the rules of G:
∗
L(G) = {w|w ∈ T ∗ and S ⇒ w}

Notation: Let L(a) denote the label of a node a in a (phrase structure) tree. Let the set of
node labels (with respect to a grammar) be denoted VN . V = VN ∪ VT (the set of node labels
plus terminal elements).
Definition 2: The categories of a phrase structure tree immediately dominating the actual
words of the sentence correspond to lexical categories and are called preterminals or X 0
categories. These are written in the format, e.g., N 0 → dog, cat, . . . .
Definition 3: A network (grammar) is cyclic if there exists a nonterminal N such that the
nonterminal can be reduced to itself (derive itself). For example, the following FTN is cyclic:
∗ ∗
N ⇒ N1 ⇒ N
Ambiguity in a context-free grammar is exactly analogous to the notion of different paths
through a finite-state transition network. If a context-free grammar G has at least one sentence
with two or more derivations, then it is ambiguous; otherwise it is unambiguous.
Definition 4: A grammar is infinitely ambiguous if for some sentences there are an infinite
number of derivations; otherwise, it is finitely ambiguous.
Example. The following grammar is infinitely ambiguous and cyclic.

Start→ S
S→ S
S→ a

Neither of these cases seem to arise in natural language. Consider the second case. Such
rules are generally not meaningful linguistically, because a nonbranching chain introduces no
new description in terms of is-a relationships. For example, consider the grammar fragment,

S→ NP VP NP→NP
NP→Det Noun

This grammar has a derivation tree with NP–NP—NP dominating whatever string of ter-
minals forms a Noun Phrase (such as an ice-cream). Worse still, there could be an arbitrary
number of NPs dominating an ice-cream. This causes computational nightmares; suppose a
parser had to build all of these possibilities—it could go on forever. Fortunately, this extra
layer of description is superﬂuous. If we know that an ice-cream is the Subject Noun Phrase,
occupying the ﬁrst two positions in the sentence, then that is all we need to know. The extra
8 6.863J Handout 7, Spring, 2001

NPs do not add anything new in the way of description, since all they can do is simply repeat
the statement that there is an NP occupying the ﬁrst two positions. To put the same point
another way, we would expect no grammatical process to operate on the lowest NP without
also aﬀecting NP in the same way.
We can contrast this nonbranching NP with a branching one:

S→ NP VP NP→NP PP
PP→ Prep NP NP→ Det Noun

Here we can have an NP dominating the branching structure NP–PP. We need both NPs,
because they describe different phrases in the input. For example, the lowest NP might domi-
nate an ice-cream and the higher NP might dominate an ice-cream with raspberry toppings—
plainly different things, to an ice-cream aficionado.
Note that like the FTN case, ambiguity is a property of a grammar. However, there are
two ways that ambiguity can arise in a context-free system. First, just as with finite-state
systems, we can have lexical or category ambiguity: one and the same word can be analyzed
in two different ways, for example, as either a Noun or a Verb. For example, we could have
Noun→time or Verb→time. Second, because we now have phrases, not just single words,
one can sometimes analyze the same sequence of word categories in more than one way. For
example, the guy on the hill with the telescope can be analyzed as a single Noun Phrase as
either,

[the guy [on the hill [with the telescope]]]

(the hill has a telescope on it)

[the guy [on the hill]][with the telescope]

(the guy has the telescope)

The word categories are not any diﬀerent in these two analyses. The only thing that changes
is how the categories are stitched together. In simple cases, one can easily spot this kind of
structural ambiguity in a context-free system. It often shows up when two diﬀerent rules share
a common phrase boundary. Here’s an example:

VP→ Verb NP PP
NP→ Det Noun PP

The NP and VP share a PP in common: the rightmost part of an NP can be a Prepositional

Phrase, which can also appear just after an NP in a VP. We might not know whether the PP
belongs to the NP or the VP—an ambiguity.
We can now deﬁne X context-free grammars.
Deﬁnition 5: (Kornai, 1983) An Xn grammar (an X n-bar grammar) is a context free grammar
G = (N, T, P, Start) satisfying in addition the following 3 constraints:
Computation & Hierarchical parsing I; Earley’s algorithm 9

(lexicality) N = {X i |1 ≤ i ≤ n, X ∈ T }
(centrality) Start = X n , X ∈ T
(succession) rules in P have the form X i → αX i−1 β
(uniformity) where α and β are possibly empty strings over
(maximality) the set of “maximal categories (projections).” NM = {X n |X ∈ T }

Notation: The superscripts are called bar levels and are sometimes notated X, X, etc. The
items α ( not a single category, note) are called the speciﬁers of a phrase, while β comprise the
complements of a phrase. The most natural example is that of a verb phrase: the objects of
the verb phrase are its COMPlements while its Speciﬁer might be the Subject Noun Phrase.
Example. Suppose n = 2. Then the X2 grammar for noun expansions could include the rules:

N 2 → DeterminerN 1 P 2
N 1 → N 0 (= noun)

The X are lexical items (word categories). The X i are called projections of X, and X is the
head of X i . Note that the X definition enforces the constraint that all maximal projections, or
full phrases like noun phrases (NPs), verb phrases (VPs), etc., are maximal projection phrases,
uniformly (this restriction is relaxed in some accounts).
It is not hard to show that the number of bar levels doesn’t really matter to the language
that is generated, in that given, say, an X 3 grammar, we can always find an X 1 grammar that
generates the same language by simply substituting for X 3 and X 2 . Conversely, given any X n
grammar, we can find an X n+1 grammar generating the same language—for every rule α → β
in the old grammar, the the new grammar just adds rules that has the same rules with the bar
level incremented by 1, plus a new rule X 1 → X 0 , for every lexical item X.
Unfortunately, Kornai’s definition itself is flawed given current views. Let us see what sort
of X theory people actually use nowadays. We can restrict the number of bar levels to 2,
make the rules binary branching, and add rules of the form X i → αX i or X i → X i β. The
binary branching requirement is quite natural for some systems of semantic interpretation, a
matter to which we will return. On this most recent view, the central assumption of the X
model is that a word category X can function as the head of a phrase and be projected to a
corresponding phrasal category XP by (optionally) adding one of three kinds of modifiers: X
can be projected into an X by adding a complement; the X can be recursively turned into
another X by the addition of a modifying adjunct, and this X can be projected into an XP
(X) by adding a specifier. In other words, the basic skeleton structure for phrases in English
looks like this:

(2) [ speciﬁer [ adjunct [ [X head] complement]]]

X X X
Moreover, all word categories, not just A, N, P, V, are projectable in this way. The so-
called functional categories D(eterminer), I(nﬂection), and C(omplementizer) (that, for , etc.)
are similarly projectable. Taking an example from Radford (1990), consider this example and
see how it is formed:

(3) They will each bitterly criticize the other

10 6.863J Handout 7, Spring, 2001

The Verb criticize is projected in a V by adding the noun complement the other ; the V
criticize the other is projected into another V by adding the adverbial adjunct bitterly; the
new V is projected into a full VP by adding the specifier each. Continuing, consider the modal
auxiliary verb will . Since it is Inflected for tense, we can assign it the category I. I is projected
into I or IP by the addition of the verb phrase complement [each bitterly criticize the other ].
This is projected into another I by adding the adverbial adjunct probably; this finally forms
an IP by adding the specifier (subject) they, so we get a full sentence. Thus IP=S(entence)
phrase, in this case.
Similarly, a phrase like that you should be tired is composed of the Complementizer word
that, which heads up what is now called a Complement Phrase or CP (and was some-
times called an S phrase; don’t get this confused with the Complement of a phrase), and the
Complement IP (=S) you should be tired ; here the specifier and adjunct are missing.
The basic configuration of Specifier-Head and Head-Complement relations may exhaust
what we need to get out of local syntactic relations, so it appears. Thus the apparent tree
structure is really derivative. (Recent linguistic work may show that the binary branching
structure is also derivative from more basic principles.)

3 Deﬁning the representation for hierarchical parsing

Representing the input and trees
We define the information needed to parse hierarchical structure in a general way, so that we
can see its relationship to what we did for precedence structure and so that we can modify this
information to handle even more complex parsing tasks. What we do is augment the notion of
a state and an element or item of that state.
As with precedence structure, we number the input stream of tokens with indices 0, 1, 2, . . . , n
that corresponds to interword positions, 0 being before the first word, i − 1 between word i − 1
and word i.
A state Si represents all the possible (dominance, precedence) relations that could be hold
of a sentence after processing i words.
An item represents the minimal information needed to describe a element of a state. In the
case of precedence relations we could encode this information as a dotted rule that indicated
what state we were in:
[X ⇒ α • β, i]
where α is the string of states seen so far and β is the (predicted) string of states to come, and
i is the (redundant) state set number. Note that in a linear network all structures implicitly
start at position 0 in the input.
A dominance/precedence structure requires this additional information for its state set
items, since each item describes a partially-constructed tree. We can determine this by first
seeing how a completed tree could be described:

1. A name for the root of the tree, e.g., S, NP, VP.

2. The left edge of the tree, i.e., its starting position in the input. This is given by a
number indexing the interword position at which we began construction of the tree.
Computation & Hierarchical parsing I; Earley’s algorithm 11

3. The right edge of the tree built so far (redundant with the state set number as before).

Example. The form [ NP 0 2] describes an NP spanning positions 0 through 2 of the input.

In terms of dotted rules then, we need to represent partially constructed trees by augmenting
the dotted rules with a third number indicating the starting position of the tree plus the
(redundant) state set number indicating how far along we have progressed so far in constructing
that tree:
[X ⇒ α • β, i, j]
This is sometimes called a state triple.
Another way to think of this is in theorem proving terms (an idea ﬁrst developed by
Minker (1972) and then independently by Kowalski (1973)): when we have a triple in the form
[X ⇒ α • β, i, j] it means that we are attempting to prove that αβ spans positions i through j
in the input, and that we have already proved (or found) that α exists in the input, and that
β is yet to be found (hence, it is “predicted”).
Note that phrase names are atomic symbols. Complex nonterminal names are possible.
Then the notion of symbol equality must follow the notions of feature-path equality described
earlier. (If the feature complexes are simple attribute-value pairs with no recursive structure
then this test is straightforward; if not, then the tests are not even guaranteed to terminate
and some kind of restriction must be placed on the system, e.g., a limit on the depth of the
test.)

4 Earley’s algorithm
Earley’s algorithm is like the state set simulation of a nondeterministic FTN presented earlier,
with the addition of a single new integer representing the starting point of a hierarchical phrase
(since now phrases can start at any point in the input). Given input n, a series of state sets S0 ,
S1 , . . ., Sn is built, where Si contains all the valid items after reading i words. The algorithm
as presented is a simple recognizer; as usual, parsing involves more work.
In theorem-proving terms, the Earley algorithm selects the leftmost nonterminal (phrase)
in a rule as the next candidate to see if one can ﬁnd a “proof” for it in the input. (By varying
which nonterminal is selected, one can come up with a diﬀerent strategy for parsing.)
12 6.863J Handout 7, Spring, 2001

To recognize a sentence using a context-free grammar G and

Earley’s algorithm:

1 Compute the initial state set, S0 :

1a Put the start state, (Start → •S, 0, 0), in S0 .

1b Execute the following steps until no new state triples
are added.
1b1 Apply complete to S0 .
1b2 Apply predict to S0 .

2 For each word wi , i = 1, 2, . . . , n, build state set Si given

state set Si−1 .

2a Apply scan to state set Si .

2b Execute the following steps until no new state triples
are added to state set Si .
2b1 Apply complete to Si
2b2 Apply predict to Si
2c If state set Si is empty, reject the sentence; else in-
crement i.
2d If i < n then go to Step 2a; else go to Step 3.

3 If state set n includes the accept state (Start → S •, 0, n),

then accept; else reject.

Deﬁning the basic operations on items

Definition 6: Scan: For all states (A → α • tβ, k, i − 1) in state set Si−1 , if wi = t, then add
(A → αt • β, k, i) to state set Si .
Definition 7: Predict (Push): Given a state (A → α • Bβ, k, i) in state set Si , then add all
states of the form (B → •γ, i, i) to state set Si .
Definition 8: Complete (Pop): If state set Si contains the triple (B → γ •, k, i), then, for all
rules in state set k of the form, (A → α • Bβ, l, k), add (A → αB • β, l, i) to state set Si . (If
the return value is empty, then do nothing.)

5 Comparison of FTN and Earley state set parsing

The FTN and Earley parsers are almost identical in terms of representations and algorithmic
structure. Both construct a sequence of state sets S0 , S1 , . . . , Sn . Both algorithms consist of
three parts: an initialization stage; a loop stage, and an acceptance stage. The only diﬀerence
is that since the Earley parser must handle an expanded notion of an item (it is now a partial
Computation & Hierarchical parsing I; Earley’s algorithm 13

tree rather than a partial linear sequence), one must add a single new integer index to mark
the return address in hierarchical structure.
Note that prediction and completion both act like -transitions: they spark parser opera-
tions without consuming any input; hence, one must close each state set construction under
these operations (= we must add all states we can reach after reading i words, including those
reached under -transitions.)
Question: where is the stack in the Earley algorithm? (Since we need a stack for return
pointers.)

FTN Parser Earley Parser

Compute initial state set S0 Compute initial state set S0
1. S0← q0 1. S0←q0
Initialize: 2. S0← eta-closure (S0) 2. S0← eta-closure (S0)
q0= [Start→•S, 0] q0= [Start→•S, 0, 0]
eta-closure= transitive eta-closure= trans. closure
closure of jump arcs of Predict and Complete

Compute Si from Si-1 Compute Si from Si-1

For each word, wi, 1=1,...,n For each word, wi, 1=1,...,n
Loop: Si← ∪d(q, wi) Si← ∪d(q, wi)
q∈Si-1 q∈Si-1
= SCAN(Si-1)
q=item
Si← e-closure(Si)
Si← e-closure(Si)
e-closure=
closure(PREDICT,
COMPLETE)
Accept/reject: Accept/reject:
Final: If qf ∈ Sn then accept; If qf∈Sn then accept;
else reject else reject
qf= [Start→S•, 0] qf= [Start→S•, 0, n]

6 A simple example of the algorithm in action

Let’s now see how this works with a simple grammar and then examine how parses may be
retrieved. There have been several schemes proposed for parse storage and retrieval.
Here is a simple grammar plus an example parse for John ate ice-cream on the table (am-
biguous as to the placement of the Prepositional Phrase on the table).
14 6.863J Handout 7, Spring, 2001

Start→S S→NP VP
NP→Name NP→Det Noun
NP→Name PP PP→ Prep NP
VP→V NP VP→V NP PP
V→ate Noun→ice-cream
Name→John Name→ice-cream
Noun→table Det→the
Prep→on

Let’s follow how this parse works using Earley’s algorithm and the parser used in laboratory
2. (The headings and running count of state numbers aren’t supplied by the parser. Also note
that Start is replaced by *DO*. Some additional duplicated states that are printed during
tracing have been removed for clarity, and comments added.)

(in-package ’gpsg)
(remove-rule-set ’testrules)
(remove-rule-set ’testdict)
(add-rule-set ’testrules ’CFG)
(add-rule-list ’testrules
’((S ==> NP VP)
(NP ==> name)
(NP ==> Name PP)
(VP ==> V NP)
(NP ==> Det Noun)
(PP ==> Prep NP)
(VP ==> V NP PP)))

(add-rule-set ’testdict ’DICTIONARY)

(add-rule-list ’testdict
’((ate V)
(John Name)
(table Noun)
(ice-cream Noun)
(ice-cream Name)
(on Prep)
(the Det)))

(create-cfg-table ’testg ’testrules ’s 0)

? (pprint (p "john ate ice-cream on the table"

:grammar ’testg :dictionary ’testdict :print-states t))
Computation & Hierarchical parsing I; Earley’s algorithm 15

State set Return ptr Dotted rule

(nothing)
0 0 *D0* ==> . S $ (1) (start state)
0 0 S ==> . NP VP (2) (predict from 1)
0 0 NP ==> . NAME (3) (predict from 2)
0 0 NP ==> . NAME PP (4) (predict from 2)
0 0 NP ==> . DET NOUN (5) (predict from 2)

John [Name]
1 0 NP ==> NAME . (6) (scan over 3)
1 0 NP ==> NAME . PP (7) (scan over 4)
1 0 S ==> NP . VP (8) (complete 6 to 2)
1 1 PP ==> . PREP NP (9) (predict from 7)
1 1 VP ==> . V NP (10) (predict from 8)
1 1 VP ==> . V NP PP (11) (predict from 8)

ate [V]
2 1 VP ==> V . NP (12) (scan over 10)
2 1 VP ==> V . NP PP (13) (scan over 11)
2 2 NP ==> . NAME (14) (predict from 12/13)
2 2 NP ==> . NAME PP (15) (predict from 12/13)
2 2 NP ==> . DET NOUN (16) (predict from 12/13)

ice-cream [Name, Noun]

3 2 NP ==> NAME . (17) (scan over 14)
3 2 NP ==> NAME . PP (18) (scan over 15)
3 1 VP ==> V NP . PP (19) (complete 17 to 13)
3 1 VP ==> V NP . (20) (complete 17 to 12)
3 3 PP ==> . PREP NP (21) (predict from 18/19)
3 0 S ==> NP VP . (22) (complete 20 to 8)
3 0 *D0* ==> S . $ (23) (complete 8 to 1)

on [Prep]
4 3 PP ==> PREP . NP (24) (scan over 21)
4 4 NP ==> . NAME (25) (predict from 24)
4 4 NP ==> . NAME PP (26) (predict from 24)
4 4 NP ==> . DET NOUN (27) (predict from 24)

the [Det]
5 4 NP ==> DET . NOUN (28) (scan over 27)

table [Noun]
6 4 NP ==> DET NOUN . (29) (scan over 28)
6 3 PP ==> PREP NP . (30) (complete 29 to 24)
6 1 VP ==> V NP PP . (31) (complete 24 to 19)
6 2 NP ==> NAME PP . (32) (complete 24 to 18)
6 0 S ==> NP VP . (33) (complete 8 to 1)
6 0 *DO* ==> S . (34) (complete 1) [parse 1]
6 1 VP ==> V NP . (35) (complete 18 to 12)
6 0 S ==> NP VP . (36) (complete 12 to 1) = 33
16 6.863J Handout 7, Spring, 2001

*DO*→S• (34)

S→NP VP• (33)

VP→V NP PP • (31) VP→V NP•(35)

NP→Name PP•(32)

PP →Prep NP•(30)

NP→Det Noun• (29)

Figure 3: Distinct parses lead to distinct state triple paths in the Earley algorithm

6 0 DO ==> S . (37) (complete 1) = 34 [parse 2]

6 1 VP ==> V NP . PP (38) (complete 18 to 13)
6 6 PP ==> . PREP NP (39) (predict from 38)

((START
(S (NP (NAME JOHN))
(VP (V ATE) (NP (NAME ICE-CREAM))
(PP (PREP ON) (NP (DET THE) (NOUN TABLE))))))
(START
(S (NP (NAME JOHN))
(VP (V ATE)
(NP (NAME ICE-CREAM) (PP (PREP ON) (NP (DET THE) (NOUN TABLE))))))))

7 Time complexity of the Earley algorithm

The worst case time complexity of the Earley algorithm is dominated by the time to construct
the state sets. This in turn is decomposed into the time to process a single item in a state
set times the maximum number of items in a state set (assuming no duplicates; thus, we are
assuming some implementation that allows us to quickly check for duplicate states in a state
Computation & Hierarchical parsing I; Earley’s algorithm 17

set). In the worst case, the maximum number of distinct items is the maximum number of
dotted rules times the maximum number of distinct return values, or |G| · n. The time to
process a single item can be found by considering separately the scan, predict and complete
actions. Scan and predict are effectively constant time (we can build in advance all the
possible single next-state transitions, given a possible category). The complete action could
force the algorithm to advance the dot in all the items in a state set, which from the previous
calculation, could be |G| · n items, hence proportional to that much time. We can combine
the values as shown below to get an upper bound on the execution time, assuming that the
primitive operations of our computer allow us to maintain lists without duplicates without any
additional overhead (say, by using bit-vectors; if this is not done, then searching through or
ordering the list of states could add in another |G| factor.). Note as before that grammar size
(measure by the total number of symbols in the rule system) is an important component to
this bound; more so than the input sentence length, as you will see in Laboratory 2.
If there is no ambiguity, then this worst case does not arise (why?). The parse is then linear
time (why?). If there is only a finite ambiguity in the grammar (at each step, there are only a
finite, bounded in advance number of ambiguous attachment possibilities) then the worst case
time is proportional to n2 .

Maximum number of state sets X Maximum time to build ONE state set

Maximum number of Maximum time

items to process ONE item

Maximum possible number

of items=
[maximum number of dotted rules X maximum number of distinct return values]

2024 DSE M1 Suggested Solutions by Jacky
No ratings yet
2024 DSE M1 Suggested Solutions by Jacky
14 pages
Autolisp Programming Notes PDF
No ratings yet
Autolisp Programming Notes PDF
23 pages
Natural Language Processing
No ratings yet
Natural Language Processing
13 pages
Unit-5-NLP (1)
No ratings yet
Unit-5-NLP (1)
13 pages
Unit-3 Notes Part-1
No ratings yet
Unit-3 Notes Part-1
48 pages
Parsing
No ratings yet
Parsing
4 pages
Lecture 2 Hierarchy of NLP & TF-IDF
No ratings yet
Lecture 2 Hierarchy of NLP & TF-IDF
48 pages
Adobe Scan 19-Oct-2024
No ratings yet
Adobe Scan 19-Oct-2024
13 pages
Phrase Structure Rules Structure Within The NP 1 Definitions
100% (1)
Phrase Structure Rules Structure Within The NP 1 Definitions
11 pages
Natural Language Parsers - A Course in Cooking
No ratings yet
Natural Language Parsers - A Course in Cooking
87 pages
Challenges (NLP) and F C Structure
No ratings yet
Challenges (NLP) and F C Structure
8 pages
601 f08 l2
No ratings yet
601 f08 l2
20 pages
Reference Resolution: Adam Meyers New York University
No ratings yet
Reference Resolution: Adam Meyers New York University
29 pages
3nlp Computer
No ratings yet
3nlp Computer
83 pages
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
No ratings yet
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
32 pages
14 Ai Cse551 NLP 2 PDF
No ratings yet
14 Ai Cse551 NLP 2 PDF
39 pages
Natural Language Processing Artificial Intelligence
No ratings yet
Natural Language Processing Artificial Intelligence
81 pages
Class 40 Intro.2.Eng.lingu-ch.3 Same
No ratings yet
Class 40 Intro.2.Eng.lingu-ch.3 Same
44 pages
Lecture-8. Only For This Batch
No ratings yet
Lecture-8. Only For This Batch
46 pages
Natural Language Processing_Notes_Unit 3
No ratings yet
Natural Language Processing_Notes_Unit 3
19 pages
18700121027_Priyanjana Ghosh_Automata
No ratings yet
18700121027_Priyanjana Ghosh_Automata
6 pages
Ling342 6
No ratings yet
Ling342 6
44 pages
Unit V Ai Notes
No ratings yet
Unit V Ai Notes
11 pages
NLP Iat QB
No ratings yet
NLP Iat QB
10 pages
Natural Language Processing
No ratings yet
Natural Language Processing
57 pages
NLP Unit 2
No ratings yet
NLP Unit 2
48 pages
Chapter 6-NLP
No ratings yet
Chapter 6-NLP
8 pages
Chapter15 NaturalLanguage
100% (1)
Chapter15 NaturalLanguage
35 pages
Unit - 3
No ratings yet
Unit - 3
10 pages
Elements of Interpretation: Semantics and Pragmatics: Introduction To Linguistics - Lecture Notes III
No ratings yet
Elements of Interpretation: Semantics and Pragmatics: Introduction To Linguistics - Lecture Notes III
23 pages
Phrase Structure Rules, Tree Rewriting, and Recursion Hierarchical Structure: Complements and Adjuncts 1 Trees
No ratings yet
Phrase Structure Rules, Tree Rewriting, and Recursion Hierarchical Structure: Complements and Adjuncts 1 Trees
21 pages
CITS4012 Lecture02 PDF
No ratings yet
CITS4012 Lecture02 PDF
54 pages
NLP Unit 1 Notes
No ratings yet
NLP Unit 1 Notes
5 pages
Woods Atn
No ratings yet
Woods Atn
16 pages
Lecture NLP
100% (1)
Lecture NLP
38 pages
NLP Unit 2 Part 1
No ratings yet
NLP Unit 2 Part 1
28 pages
Module 14
No ratings yet
Module 14
7 pages
Ai Unit 5
No ratings yet
Ai Unit 5
19 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
Infographic Presentation 20240929 224943 0000
No ratings yet
Infographic Presentation 20240929 224943 0000
89 pages
Natural Language Processing
No ratings yet
Natural Language Processing
32 pages
14-syntax-1
No ratings yet
14-syntax-1
22 pages
Lecture7 & 8. Syntaxpptx
No ratings yet
Lecture7 & 8. Syntaxpptx
42 pages
Lecture 6
No ratings yet
Lecture 6
43 pages
Solutions To NLP I Mid Set A
100% (1)
Solutions To NLP I Mid Set A
8 pages
Understanding Natural Languages
No ratings yet
Understanding Natural Languages
2 pages
Constituency Parsing Ppt
No ratings yet
Constituency Parsing Ppt
94 pages
Ling Syntax Reviewer
No ratings yet
Ling Syntax Reviewer
8 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
50 pages
NLP- Part-II
No ratings yet
NLP- Part-II
39 pages
NLP Mid-1
No ratings yet
NLP Mid-1
15 pages
Bcs Lecture2 2023
No ratings yet
Bcs Lecture2 2023
43 pages
Syntax
No ratings yet
Syntax
6 pages
Natural Language Processing (NLP)_Chomsky’s Theories of Syntax
No ratings yet
Natural Language Processing (NLP)_Chomsky’s Theories of Syntax
9 pages
Natural Language Processing
No ratings yet
Natural Language Processing
15 pages
AI Unit 3 Lecture 2
No ratings yet
AI Unit 3 Lecture 2
8 pages
Syntax PPT Lecture
No ratings yet
Syntax PPT Lecture
10 pages
2 Presentation (Syntax)
No ratings yet
2 Presentation (Syntax)
13 pages
My Syntax Presentation Explaination
No ratings yet
My Syntax Presentation Explaination
4 pages
NLP_Unit 1
No ratings yet
NLP_Unit 1
38 pages
Coreference: Fundamentals and Applications
From Everand
Coreference: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Sentence Diagramming Primer: The Reed & Kellogg System Step-By-Step
From Everand
A Sentence Diagramming Primer: The Reed & Kellogg System Step-By-Step
Dr. Judith Coats
No ratings yet
BITS Pilani Faculty Recruitment Advertisement
No ratings yet
BITS Pilani Faculty Recruitment Advertisement
1 page
Assembly Quiz - Bubble Sort
No ratings yet
Assembly Quiz - Bubble Sort
3 pages
Buildings Surviving Earthquake
No ratings yet
Buildings Surviving Earthquake
9 pages
NEWMay LIST - OF - LICENSED - CFAS
No ratings yet
NEWMay LIST - OF - LICENSED - CFAS
103 pages
Week-2 Module-2 Principles of Image Interpretation
No ratings yet
Week-2 Module-2 Principles of Image Interpretation
25 pages
Lada (Niva, Kalina, Priora) Compatible ELM 327
No ratings yet
Lada (Niva, Kalina, Priora) Compatible ELM 327
13 pages
Coffee Seedlings Distribution Guidelines
No ratings yet
Coffee Seedlings Distribution Guidelines
2 pages
Subregistrar
No ratings yet
Subregistrar
11 pages
8P
No ratings yet
8P
4 pages
REMUS6000_Specs_Jan23
No ratings yet
REMUS6000_Specs_Jan23
1 page
A Step Towards Rural Transformation
No ratings yet
A Step Towards Rural Transformation
102 pages
DLL Agriculture WK 1 10
No ratings yet
DLL Agriculture WK 1 10
58 pages
Kolkata Untapped Real Estate Opportunities: September 2016
No ratings yet
Kolkata Untapped Real Estate Opportunities: September 2016
25 pages
5.1 Turbidity Meter Calibration Service Delivery
No ratings yet
5.1 Turbidity Meter Calibration Service Delivery
8 pages
1) 64 Bit Ripple Carry Adder Code With Output
No ratings yet
1) 64 Bit Ripple Carry Adder Code With Output
4 pages
0-13-933821-7 Unix System V Rel4 Migration Guide 1990
No ratings yet
0-13-933821-7 Unix System V Rel4 Migration Guide 1990
96 pages
Palm Oil Mill Bio Refiner
100% (2)
Palm Oil Mill Bio Refiner
21 pages
Simulasi Harga Pekerjaan Painting: 21,981.28 Price Per-M2 Area (m2)
No ratings yet
Simulasi Harga Pekerjaan Painting: 21,981.28 Price Per-M2 Area (m2)
1 page
Aud Theo 96 102
No ratings yet
Aud Theo 96 102
7 pages
Pump Head Calculations
100% (2)
Pump Head Calculations
4 pages
Bans Enshu Kai
No ratings yet
Bans Enshu Kai
11 pages
Instruction Manual: Unitech'S
No ratings yet
Instruction Manual: Unitech'S
14 pages
Programming Guideline DOCU v14 en PDF
No ratings yet
Programming Guideline DOCU v14 en PDF
109 pages
Morioka 020-0000 - Google Search
No ratings yet
Morioka 020-0000 - Google Search
1 page
Pinout Cable de Consola: Db9 Female Db9 Female Db9 Female RJ45
100% (2)
Pinout Cable de Consola: Db9 Female Db9 Female Db9 Female RJ45
2 pages
C++ 2023 Final
No ratings yet
C++ 2023 Final
2 pages
Buy ebook The Constitutional Law of Bangladesh: Progression and Transformation at its 50th Anniversary 1st Edition M Rafiqul Islam cheap price
100% (3)
Buy ebook The Constitutional Law of Bangladesh: Progression and Transformation at its 50th Anniversary 1st Edition M Rafiqul Islam cheap price
50 pages
UNIT_1
No ratings yet
UNIT_1
140 pages

Notes 3

Uploaded by

Notes 3

Uploaded by

Massachusetts Institute of Technology

6.863J/9.611J, Natural Language Processing, Spring, 2001

Handout 7: Computation & Hierarchical parsing I; Earley’s algorithm

1 Representations for dominance/precedence structure

I know that this President enjoys the exercise of military power.

• Nonadjacent constraints or grammatical relations. Natural languages exhibit constraints

• Compositional meaning. Natural languages have sentences whose meaning intuitively

A phrase or nonterminal is a collection of words that behaves alike (= acts identically

(1) (i) John kissed the baby.

A phrase category (nonterminal), by analogy with a word category, is determined by

Predict, Predict, Complete

Figure 2: A multidimensional syntactic structure.

John bought or sold a new car

This is an internal representation of a sentence. When it comes time to speak it must of

2 How many kinds of phrases are there?

Verb Phrase (VP)→Verb (NP)∗ (PP)∗ (S)

Generalizing: X Phrase (XP)→Verb (NP)∗ (PP)∗ (S)

[the guy [on the hill [with the telescope]]]

[the guy [on the hill]][with the telescope]

The NP and VP share a PP in common: the rightmost part of an NP can be a Prepositional

(2) [ speciﬁer [ adjunct [ [X head] complement]]]

(3) They will each bitterly criticize the other

3 Deﬁning the representation for hierarchical parsing

1. A name for the root of the tree, e.g., S, NP, VP.

Example. The form [ NP 0 2] describes an NP spanning positions 0 through 2 of the input.

To recognize a sentence using a context-free grammar G and

1 Compute the initial state set, S0 :

1a Put the start state, (Start → •S, 0, 0), in S0 .

2 For each word wi , i = 1, 2, . . . , n, build state set Si given

2a Apply scan to state set Si .

3 If state set n includes the accept state (Start → S •, 0, n),

Deﬁning the basic operations on items

5 Comparison of FTN and Earley state set parsing

FTN Parser Earley Parser

Compute Si from Si-1 Compute Si from Si-1

6 A simple example of the algorithm in action

(add-rule-set ’testdict ’DICTIONARY)

(create-cfg-table ’testg ’testrules ’s 0)

? (pprint (p "john ate ice-cream on the table"

State set Return ptr Dotted rule

ice-cream [Name, Noun]

S→NP VP• (33)

VP→V NP PP • (31) VP→V NP•(35)

NP→Det Noun• (29)

6 0 *DO* ==> S . (37) (complete 1) = 34 [parse 2]

7 Time complexity of the Earley algorithm

Maximum number of Maximum time

Maximum possible number

You might also like

6 0 DO ==> S . (37) (complete 1) = 34 [parse 2]