0% found this document useful (0 votes)
4 views44 pages

Lecture 8

Uploaded by

Beekan Gammadaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views44 pages

Lecture 8

Uploaded by

Beekan Gammadaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Parsing

Today
• Parsing with CFGs
– Bottom-up, top-down
– Ambiguity
– CKY parsing
Parsing
• Parsing with CFGs refers to the task of
assigning proper trees to input strings
• Proper here means a tree that covers all and
only the elements of the input and has an S
at the top
• It doesn’t actually mean that the system can
select the correct tree from among all the
possible trees
Parsing
• As with everything of interest, parsing
involves a search which involves the making
of choices
• We’ll start with some basic methods before
moving on to the one or two that you need to
know
For Now
• Assume…
– You have all the words already in some buffer
– The input isn’t POS tagged
– We won’t worry about morphological analysis
– All the words are known

– These are all problematic in various ways, and


would have to be addressed in real applications.
Top-Down Search
• Since we’re trying to find trees rooted with
an S (Sentences), why not start with the
rules that give us an S.
• Then we can work our way down from
there to the words.
Top Down Space
Bottom-Up Parsing
• Of course, we also want trees that cover the
input words. So we might also start with
trees that link up with the words in the right
way.
• Then work your way up from there to larger
and larger trees.
Bottom-Up Search
Bottom-Up Search
Bottom-Up Search
Bottom-Up Search
Bottom-Up Search
Top-Down and Bottom-Up
• Top-down
– Only searches for trees that can be answers (i.e.
S’s)
– But also suggests trees that are not consistent
with any of the words
• Bottom-up
– Only forms trees consistent with the words
– But suggests trees that make no sense globally
Control
• Of course, in both cases we left out how to
keep track of the search space and how to
make choices
– Which node to try to expand next
– Which grammar rule to use to expand a node
• One approach is called backtracking.
– Make a choice, if it works out then fine
– If not then back up and make a different choice
Problems
• Even with the best filtering, backtracking
methods are doomed because of two inter-
related problems
– Ambiguity
– Shared subproblems
Ambiguity
Shared Sub-Problems
• No matter what kind of search (top-down or
bottom-up or mixed) that we choose.
– We don’t want to redo work we’ve already
done.
– Unfortunately, naïve backtracking will lead to
duplicated work.
Shared Sub-Problems
• Consider
– A flight from Indianapolis to Houston on TWA
Shared Sub-Problems
• Assume a top-down parse making choices
among the various Nominal rules.
• In particular, between these two
– Nominal -> Noun
– Nominal -> Nominal PP
• Statically choosing the rules in this order
leads to the following bad results...
Shared Sub-Problems
Shared Sub-Problems
Shared Sub-Problems
Shared Sub-Problems
Dynamic Programming
• DP search methods fill tables with partial results
and thereby
– Avoid doing avoidable repeated work
– Solve exponential problems in polynomial time (well,
no not really)
– Efficiently store ambiguous structures with shared sub-
parts.
• We’ll cover two approaches that roughly
correspond to top-down and bottom-up
approaches.
– CKY
– Earley
Sample L1 Grammar
CNF Conversion
CKY
• So let’s build a table so that an A spanning
from i to j in the input is placed in cell [i,j]
in the table.
• So a non-terminal spanning an entire string
will sit in cell [0, n]
– Hopefully an S
• If we build the table bottom-up, we’ll know
that the parts of the A must go from i to k
and from k to j, for some k.
CKY
• Meaning that for a rule like A  B C we
should look for a B in [i,k] and a C in [k,j].
• In other words, if we think there might be
an A spanning i,j in the input… AND
A  B C is a rule in the grammar THEN
• There must be a B in [i,k] and a C in [k,j]
for some i<k<j
CKY
• So to fill the table loop over the cell[i,j]
values in some systematic way
– What constraint should we put on that
systematic search?

– For each cell, loop over the appropriate k


values to search for things to add.
CKY Algorithm
CKY Parsing
• Is that really a parser?
CKY Notes
• Since it’s bottom up, CKY populates the table
with a lot of phantom constituents.
– Segments that by themselves are constituents but
cannot really occur in the context in which they are
being suggested.
– To avoid this we can switch to a top-down control
strategy
– Or we can add some kind of filtering that blocks
constituents where they can not happen in a final
analysis.
Core Earley Code
Earley Code
Example
• Book that flight
• We should find… an S from 0 to 3 that is a
completed state…
Chart[0]

Note that given a grammar, these


entries are the same for all inputs;
they can be pre-loaded.
Chart[1]
Charts[2] and [3]
Efficiency
• For such a simple example, there seems to be a
lot of useless stuff in there.
• Why?

• It’s predicting things that aren’t


consistent with the input
• That’s the flipside to the CKY
problem.
Details
• As with CKY that isn’t a parser until we
add the backpointers so that each state
knows where it came from.
Back to Ambiguity
• Did we solve it?
Ambiguity
• No…
– Both CKY and Earley will result in multiple S
structures for the [0,N] table entry.
– They both efficiently store the sub-parts that are
shared between multiple parses.
– And they obviously avoid re-deriving those
sub-parts.
– But neither can tell us which one is right.
Ambiguity

• In most cases, humans don’t notice incidental


ambiguity (lexical or syntactic). It is resolved
on the fly and never noticed.
• We’ll try to model that with probabilities.
• But note something odd and important about
the Groucho Marx example…

You might also like