0% found this document useful (0 votes)
13 views122 pages

19 Parsing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views122 pages

19 Parsing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

Statistical Natural Language Parsing

Mausam

(Based on slides of Michael Collins, Dan Jurafsky, Dan Klein,


Chris Manning, Ray Mooney, Luke Zettlemoyer)
Two views of linguistic structure:
1. Constituency (phrase structure)

• Phrase structure organizes words into nested constituents.


• How do we know what is a constituent? (Not that linguists don’t
argue about some cases.)
• Distribution: a constituent behaves as a unit that can appear in different
places:
• John talked [to the children] [about drugs].
• John talked [about drugs] [to the children].
• *John talked drugs to the children about
• Substitution/expansion/pro-forms:
• I sat [on the box/right on top of the box/there].
• Coordination, regular internal structure, no intrusion,
fragments, semantics, …
Two views of linguistic structure:
2. Dependency structure

• Dependency structure shows which words depend on (modify or


are arguments of) which other words.

put

boy tortoise on
The the rug
The boy put the tortoise on the rug
the
Why Parse?
• Part of speech information
• Phrase information
• Useful relationships

8
The rise of annotated data:
The Penn Treebank
[Marcus et al. 1993, Computational Linguistics]
( (S
(NP-SBJ (DT The) (NN move))
(VP (VBD followed)
(NP
(NP (DT a) (NN round))
(PP (IN of)
(NP
(NP (JJ similar) (NNS increases))
(PP (IN by)
(NP (JJ other) (NNS lenders)))
(PP (IN against)
(NP (NNP Arizona) (JJ real) (NN estate) (NNS loans))))))
(, ,)
(S-ADV
(NP-SBJ (-NONE- *))
(VP (VBG reflecting)
(NP
(NP (DT a) (VBG continuing) (NN decline))
(PP-LOC (IN in)
(NP (DT that) (NN market)))))))
(. .)))
Penn Treebank Non-terminals
Statistical parsing applications

Statistical parsers are now robust and widely used in larger NLP
applications:
• High precision question answering [Pasca and Harabagiu SIGIR 2001]
• Improving biological named entity finding [Finkel et al. JNLPBA 2004]
• Syntactically based sentence compression [Lin and Wilbur 2007]
• Extracting opinions about products [Bloom et al. NAACL 2007]
• Improved interaction in computer games [Gorniak and Roy 2005]
• Helping linguists find data [Resnik et al. BLS 2005]
• Source sentence analysis for machine translation [Xu et al. 2009]
• Relation extraction systems [Fundel et al. Bioinformatics 2006]
Example Application: Machine Translation

• The boy put the tortoise on the rug

• लड़के ने रखा कछु आ ऊपर कालीन


• SVO vs. SOV; preposition vs. post-position
S S

NP VP PP NP VP PP

DT NN V NP IN NP DT NN V NP IN NP

on on
the boy the boy
put DT NN DT NN put DT NN DT NN
the tortoise the rug the tortoise the rug
Example Application: Machine Translation

• The boy put the tortoise on the rug

• लड़के ने रखा कछु आ ऊपर कालीन


• SVO vs. SOV; preposition vs. post-position
S S

NP VP PP NP PP VP

DT NN V NP IN NP DT NN IN NP V NP

on
the boy the boy on
put DT NN DT NN DT NN put DT NN
the tortoise the rug the tortoise
the rug
Example Application: Machine Translation

• The boy put the tortoise on the rug

• लड़के ने रखा कछु आ ऊपर कालीन


• SVO vs. SOV; preposition vs. post-position
S S

NP VP PP NP PP VP

DT NN V NP IN NP DT NN NP IN V NP

on
the boy the boy on
put DT NN DT NN DT NN put DT NN
the tortoise the rug the tortoise
the rug
Example Application: Machine Translation

• The boy put the tortoise on the rug

• लड़के ने रखा कछु आ ऊपर कालीन


• SVO vs. SOV; preposition vs. post-position
S S

NP VP PP NP PP VP

DT NN V NP IN NP DT NN NP IN NP V
on
the boy the boy on
put DT NN DT NN DT NN DT NN put
the tortoise the rug the tortoise
the rug
Example Application: Machine Translation

• The boy put the tortoise on the rug

• लड़के ने रखा कछु आ ऊपर कालीन


• SVO vs. SOV; preposition vs. post-position
S S

NP VP PP NP PP VP

DT NN V NP IN NP DT NN NP IN NP V
on
the boy
लड़के ने DT ऊपर
put DT NN DT NN NN DT NN रखा
the tortoise the rug कछु आ
कालीन
Pre 1990 (“Classical”) NLP Parsing
• Goes back to Chomsky’s PhD thesis in 1950s
• Wrote symbolic grammar (CFG or often richer) and lexicon
S  NP VP NN  interest
NP  (DT) NN NNS  rates
NP  NN NNS NNS  raises
NP  NNP VBP  interest
VP  V NP VBZ  rates

• Used grammar/proof systems to prove parses from words


• This scaled very badly and didn’t give coverage. For sentence:
Fed raises interest rates 0.5% in effort to control inflation
• Minimal grammar: 36 parses
• Simple 10 rule grammar: 592 parses
• Real-size broad-coverage grammar: millions of parses
Classical NLP Parsing:
The problem and its solution

• Categorical constraints can be added to grammars to limit


unlikely/weird parses for sentences
• But the attempt make the grammars not robust
• In traditional systems, commonly 30% of sentences in even an edited
text would have no parse.
• A less constrained grammar can parse more sentences
• But simple sentences end up with ever more parses with no way to
choose between them
• We need mechanisms that allow us to find the most likely
parse(s) for a sentence
• Statistical parsing lets us work with very loose grammars that admit
millions of parses for sentences but still quickly find the best parse(s)
Context Free Grammars and Ambiguities

20
Context-Free Grammars

21
Context-Free Grammars in NLP

• A context free grammar G in NLP = (N, C, Σ, S, L, R)


• Σ is a set of terminal symbols
• C is a set of preterminal symbols
• N is a set of nonterminal symbols
• S is the start symbol (S ∈ N)
• L is the lexicon, a set of items of the form X  x
• X ∈ C and x ∈ Σ
• R is the grammar, a set of items of the form X  
• X ∈ N and  ∈ (N ∪ C)*
• By usual convention, S is the start symbol, but in statistical NLP,
we usually have an extra node at the top (ROOT, TOP)
• We usually write e for an empty sequence, rather than nothing
22
A Context Free Grammar of English

23
Left-Most Derivations

24
Properties of CFGs

25
A Fragment of a Noun Phrase Grammar

26
Extended Grammar with Prepositional Phrases

27
Verbs, Verb Phrases and Sentences

28
PPs Modifying Verb Phrases

29
Complementizers and SBARs

30
More Verbs

31
Coordination

32
Much more remains…

33
Attachment ambiguities

• A key parsing decision is how we ‘attach’ various constituents


• PPs, adverbial or participial phrases, infinitives, coordinations, etc.
Attachment ambiguities

• A key parsing decision is how we ‘attach’ various constituents


• PPs, adverbial or participial phrases, infinitives, coordinations, etc.

• Catalan numbers: Cn = (2n)!/[(n+1)!n!]


• An exponentially growing series, which arises in many tree-like contexts:
• E.g., the number of possible triangulations of a polygon with n+2 sides
• Turns up in triangulation of probabilistic graphical models….
Attachments

• I cleaned the dishes from dinner

• I cleaned the dishes with detergent

• I cleaned the dishes in my pajamas

• I cleaned the dishes in the sink


Syntactic Ambiguities I
• Prepositional phrases:
They cooked the beans in the pot on the stove with
handles.
• Particle vs. preposition:
The lady dressed up the staircase.
• Complement structures
The tourists objected to the guide that they couldn’t hear.
She knows you like the back of her hand.
• Gerund vs. participial adjective
Visiting relatives can be boring.
Changing schedules frequently confused passengers.
Syntactic Ambiguities II

• Modifier scope within NPs


impractical design requirements
plastic cup holder
• Multiple gap constructions
The chicken is ready to eat.
The contractors are rich enough to sue.
• Coordination scope:
Small rats and mice can squeeze into holes or cracks in
the wall.
Non-Local Phenomena

• Dislocation / gapping
• Which book should Peter buy?
• A debate arose which continued until the election.

• Binding
• Reference
• The IRS audits itself
• Control
• I want to go
• I want you to go
40
Context-Free Grammars in NLP

• A context free grammar G in NLP = (N, C, Σ, S, L, R)


• Σ is a set of terminal symbols
• C is a set of preterminal symbols
• N is a set of nonterminal symbols
• S is the start symbol (S ∈ N)
• L is the lexicon, a set of items of the form X  x
• X ∈ C and x ∈ Σ
• R is the grammar, a set of items of the form X  
• X ∈ N and  ∈ (N ∪ C)*
• By usual convention, S is the start symbol, but in statistical NLP,
we usually have an extra node at the top (ROOT, TOP)
• We usually write e for an empty sequence, rather than nothing
42
Parsing: Two problems to solve:
1. Repeated work…
Parsing: Two problems to solve:
1. Repeated work…
Parsing: Two problems to solve:
2. Choosing the correct parse

• How do we work out the correct attachment:

• She saw the man with a telescope


• Is the problem ‘AI complete’? Yes, but …
• Words are good predictors of attachment
• Even absent full understanding

• Moscow sent more than 100,000 soldiers into Afghanistan …

• Sydney Water breached an agreement with NSW Health …

• Our statistical parsers will try to exploit such statistics.


Probabilistic Context Free Grammar

46
Probabilistic – or stochastic – context-free
grammars (PCFGs)

• G = (Σ, N, S, R, P)
• T is a set of terminal symbols
• N is a set of nonterminal symbols
• S is the start symbol (S ∈ N)
• R is a set of rules/productions of the form X  
• P is a probability function
• P: R  [0,1]

• A grammar G generates a language model L.

åg ÎT *
P(g ) = 1
A Probabilistic Context-Free Grammar (PCFG)
PCFG Example
Vi ⇒ sleeps 1.0
S ⇒ NP VP 1.0
Vt ⇒ saw 1.0
VP ⇒ Vi 0.4
NN ⇒ man 0.7
VP ⇒ Vt NP 0.4
NN ⇒ woman 0.2
VP ⇒ VP PP 0.2
NN ⇒ telescope 0.1
NP ⇒ DT NN 0.3
DT ⇒ the 1.0
NP ⇒ NP PP 0.7
IN ⇒ with 0.5
PP ⇒ P NP 1.0
IN ⇒ in 0.5
• Probability of a tree t with rules
α 1 → β1, α 2 → β2, . . . , α n → βn
is n
p(t) = q(α i → βi )
i= 1
where q(α → β) is the probability for rule α → β.
Example of a PCFG

49
A Probabilistic Context-Free Grammar (PCFG)
Probability of a Parse
Vi ⇒ sleeps 1.0
S ⇒ NP VP 1.0 S1.0
Vt ⇒ saw 1.0
VP ⇒ Vi 0.4
VP ⇒ Vt NP 0.4 tNN
1= ⇒NP0.3manVP
0.4
0.7
NN DT ⇒ NNwoman 0.2
VP ⇒ VP PP 0.2 Vi
NN ⇒ telescope
1.0 0.7 1.0
0.1
NP ⇒ DT NN 0.3 The man sleeps
Free DT ⇒ the 1.0
NPGrammar
⇒ NP PP(PCFG)
0.7 p(t1)=1.0*0.3*1.0*0.7*0.4*1.0
IN ⇒ with 0.5
PP ⇒ P NP 1.0
IN ⇒ inS1.0 0.5
•ViProbability
⇒ sleeps of a tree t1.0
with rules VP 0.2
Vt ⇒ saw 1.0
α 1 → β1, α 2 →t2β=2, . . . , α n → VPβn
NN ⇒ man 0.7 0.4
PP0.4
NNis ⇒ woman 0.2 n NP0.3 Vt NP0.3 IN NP0.3
NN ⇒ telescope p(t) 0.1 = q(α → β
1.0
)
0.5
DT i NN i DT NN DT NN
DT ⇒ the 1.0 i = 1 1.0 0.7 1.0 0.2 1.0 0.1
INwhere
⇒ q(αwith The man saw the woman with the telescope
0.5probability
→ β) is the for rule α → β.
IN ⇒ in 0.5 p(t )=1.8*0.3*1.0*0.7*0.2*0.4*1.0*0.3*1.0*0.2*0.4*0.5*0.3*1.0*0.1
s

es
PCFGs: Learning and Inference
 Model
 The probability of a tree t with n rules αi  βi, i = 1..n

 Learning
 Read the rules off of labeled sentences, use ML estimates for
probabilities

 and use all of our standard smoothing tricks!

 Inference
 For input sentence s, define T(s) to be the set of trees whose yield is s
(whole leaves, read left to right, match the words in s)
Grammar Transforms

52
Chomsky Normal Form

• All rules are of the form X  Y Z or X  w


• X, Y, Z ∈ N and w ∈ Σ
• A transformation to this form doesn’t change the weak
generative capacity of a CFG
• That is, it recognizes the same language
• But maybe with different trees
• Empties and unaries are removed recursively
• n-ary rules are divided by introducing new nonterminals (n > 2)
A phrase structure grammar

S  NP VP N  people
VP  V NP N  fish
VP  V NP PP N  tanks
NP  NP NP N  rods
NP  NP PP V  people
NP  N V  fish
NP  e V  tanks
PP  P NP P  with
Chomsky Normal Form steps

S  NP VP N  people
S  VP N  fish
VP  V NP
N  tanks
VP  V
VP  V NP PP N  rods
VP  V PP V  people
NP  NP NP V  fish
NP  NP V  tanks
NP  NP PP
P  with
NP  PP
NP  N
PP  P NP
PP  P
Chomsky Normal Form steps

S  NP VP N  people
VP  V NP
S  V NP N  fish
VP  V N  tanks
SV
N  rods
VP  V NP PP
S  V NP PP V  people
VP  V PP V  fish
S  V PP
NP  NP NP V  tanks
NP  NP P  with
NP  NP PP
NP  PP
NP  N
PP  P NP
PP  P
Chomsky Normal Form steps

S  NP VP N  people
VP  V NP
S  V NP N  fish
VP  V N  tanks
VP  V NP PP
N  rods
S  V NP PP
VP  V PP V  people
S  V PP S  people
NP  NP NP
NP  NP V  fish
NP  NP PP S  fish
NP  PP
NP  N
V  tanks
PP  P NP S  tanks
PP  P
P  with
Chomsky Normal Form steps

S  NP VP N  people
VP  V NP N  fish
S  V NP N  tanks
VP  V NP PP N  rods
S  V NP PP V  people
VP  V PP S  people
S  V PP
VP  people
NP  NP NP
V  fish
NP  NP
S  fish
NP  NP PP
VP  fish
NP  PP
V  tanks
NP  N
PP  P NP S  tanks
PP  P VP  tanks
P  with
Chomsky Normal Form steps

S  NP VP NP  people
VP  V NP NP  fish
S  V NP NP  tanks
VP  V NP PP NP  rods
S  V NP PP V  people
VP  V PP S  people
S  V PP VP  people
NP  NP NP V  fish
NP  NP PP S  fish
NP  P NP VP  fish
PP  P NP V  tanks
S  tanks
VP  tanks
P  with
PP  with
Chomsky Normal Form steps

S  NP VP NP  people
VP  V NP NP  fish
S  V NP NP  tanks
VP  V @VP_V NP  rods
@VP_V  NP PP V  people
S  V @S_V S  people
@S_V  NP PP VP  people
VP  V PP V  fish
S  V PP S  fish
NP  NP NP VP  fish
NP  NP PP V  tanks
NP  P NP S  tanks
PP  P NP VP  tanks
P  with
PP  with
A phrase structure grammar

S  NP VP N  people
VP  V NP N  fish
VP  V NP PP N  tanks
NP  NP NP N  rods
NP  NP PP V  people
NP  N V  fish
NP  e V  tanks
PP  P NP P  with
Chomsky Normal Form steps

S  NP VP NP  people
VP  V NP NP  fish
S  V NP NP  tanks
VP  V @VP_V NP  rods
@VP_V  NP PP V  people
S  V @S_V S  people
@S_V  NP PP VP  people
VP  V PP V  fish
S  V PP S  fish
NP  NP NP VP  fish
NP  NP PP V  tanks
NP  P NP S  tanks
PP  P NP VP  tanks
P  with
PP  with
Chomsky Normal Form

• You should think of this as a transformation for efficient parsing


• With some extra book-keeping in symbol names, you can even
reconstruct the same trees with a detransform
• In practice full Chomsky Normal Form is a pain
• Reconstructing n-aries is easy
• Reconstructing unaries/empties is trickier

• Binarization is crucial for cubic time CFG parsing

• The rest isn’t necessary; it just makes the algorithms cleaner and
a bit quicker
An example: before binarization…

ROOT

NP VP

V NP PP
N
P NP
N
N

people fish tanks with rods


After binarization…

ROOT

NP VP

@VP_V

V NP PP
N

N P NP

people fish tanks with rods


Parsing

67
Constituency Parsing

PCFG
Rule Prob θi

S S  NP VP θ0
NP  NP NP θ1
VP

NP NP N  fish θ42
N  people θ43
N N V N
V  fish θ44
fish people fish tanks …
Cocke-Kasami-Younger (CKY)
Constituency Parsing (Parse Triangle/Chart)

fish people fish tanks


Viterbi (Max) Scores

S  NP VP 0.9
S  VP 0.1
VP  V NP 0.5
VP  V 0.1
VP 0.06 VP  V @VP_V 0.3
NP 0.35
NP 0.14 VP  V PP
V 0.1 0.1
V 0.6 @VP_V  NP PP 1.0
N 0.5
N 0.2 NP  NP NP 0.1
NP  NP PP 0.2
NP  N 0.7
people fish PP  P NP 1.0
Extended CKY parsing

• Unaries can be incorporated into the algorithm


• Messy, but doesn’t increase algorithmic complexity
• Empties can be incorporated
• Use fenceposts
• Doesn’t increase complexity; essentially like unaries
Extended CKY parsing

• Unaries can be incorporated into the algorithm


• Messy, but doesn’t increase algorithmic complexity
• Empties can be incorporated
• Use fenceposts
• Doesn’t increase complexity; essentially like unaries

• Binarization is vital
• Without binarization, you don’t get parsing cubic in the length of the
sentence and in the number of nonterminals in the grammar
• Binarization may be an explicit transformation or implicit in how the parser
works (Earley-style dotted rules), but it’s always there.
A Recursive Parser
bestScore(X,i,j,s)
if (j == i)
return q(X->s[i])
else
return max q(X->YZ) *
k,X->YZ
bestScore(Y,i,k,s) *
bestScore(Z,k+1,j,s)
The CKY algorithm (1960/1965)
… extended to unaries

function CKY(words, grammar) returns [most_probable_parse,prob]


score = new double[#(words)+1][#(words)+1][#(nonterms)]
back = new Pair[#(words)+1][#(words)+1][#nonterms]]
//LEXICON
for i=0; i<#(words); i++
for A in nonterms
if A -> words[i] in grammar
score[i][i+1][A] = P(A -> words[i])
//handle unaries
boolean added = true
while added
added = false
for A, B in nonterms
if score[i][i+1][B] > 0 && A->B in grammar
prob = P(A->B)*score[i][i+1][B]
if prob > score[i][i+1][A]
score[i][i+1][A] = prob
back[i][i+1][A] = B
added = true
The CKY algorithm (1960/1965)
… extended to unaries
//build higher order cells
for span = 2 to #(words)
for begin = 0 to #(words)- span
end = begin + span
for split = begin+1 to end-1
for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)
if prob > score[begin][end][A]
score[begin]end][A] = prob
back[begin][end][A] = new Triple(split,B,C)
//handle unaries
boolean added = true
while added
added = false
for A, B in nonterms
prob = P(A->B)*score[begin][end][B];
if prob > score[begin][end][A]
score[begin][end][A] = prob
back[begin][end][A] = B
added = true
return buildTree(score, back)
The grammar:
Binary, no epsilons,

S  NP VP 0.9 N  people 0.5


S  VP 0.1 N  fish 0.2
VP  V NP 0.5
N  tanks 0.2
VP  V 0.1
VP  V @VP_V 0.3 N  rods 0.1
VP  V PP 0.1 V  people 0.1
@VP_V  NP PP 1.0 V  fish 0.6
NP  NP NP 0.1 V  tanks 0.3
NP  NP PP 0.2
P  with 1.0
NP  N 0.7
PP  P NP 1.0
fish 1 people 2 fish 3 tanks 4
0

score[0][1] score[0][2] score[0][3] score[0][4]

score[1][2] score[1][3] score[1][4]

score[2][3] score[2][4]

score[3][4]

4
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1
VP  V NP 0.5
VP  V 0.1
VP  V @VP_V 0.3
1
VP  V PP 0.1
@VP_V  NP PP 1.0
NP  NP NP 0.1
NP  NP PP 0.2
NP  N 0.7 2
PP  P NP 1.0

N  people 0.5
N  fish 0.2
3
N  tanks
for i=0; i<#(words); i++
0.2
for A in nonterms
N  rods 0.1 if A -> words[i] in grammar
V  people 0.1 score[i][i+1][A] = P(A -> words[i]);
V  fish 0.6 4
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2
V  fish 0.6
VP  V NP 0.5
VP  V 0.1
VP  V @VP_V 0.3
1
VP  V PP 0.1 N  people 0.5
@VP_V  NP PP 1.0 V  people 0.1
NP  NP NP 0.1
NP  NP PP 0.2
NP  N 0.7 2
PP  P NP 1.0 N  fish 0.2
V  fish 0.6

N  people 0.5
N  fish 0.2 // handle unaries
3
boolean added = true
N  tanks while added N  tanks 0.2
0.2 added = false V  tanks 0.1
for A, B in nonterms
N  rods 0.1 if score[i][i+1][B] > 0 && A->B in grammar
V  people 0.1 prob = P(A->B)*score[i][i+1][B]
if(prob > score[i][i+1][A])
V  fish 0.6 4 score[i][i+1][A] = prob
back[i][i+1][A] = B
V  tanks 0.3
added = true
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2
V  fish 0.6
VP  V NP 0.5
NP  N 0.14
VP  V 0.1 VP  V 0.06
VP  V @VP_V 0.3 S  VP 0.006
1
VP  V PP 0.1 N  people 0.5
@VP_V  NP PP 1.0 V  people 0.1
NP  NP NP 0.1 NP  N 0.35
NP  NP PP 0.2 VP  V 0.01
2 S  VP 0.001
NP  N 0.7
PP  P NP 1.0 N  fish 0.2
V  fish 0.6
NP  N 0.14
N  people 0.5 VP  V 0.06
N  fish 0.2 S  VP 0.006
3
N  tanks N  tanks 0.2
0.2 prob=score[begin][split][B]*score[split][end][C]*P(A->BC) V  tanks 0.1
if (prob > score[begin][end][A])
N  rods 0.1 score[begin]end][A] = prob NP  N 0.14
V  people 0.1 back[begin][end][A] = new Triple(split,B,C) VP  V 0.03
V  fish 0.6 4 S  VP 0.003
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2 NP  NP NP
V  fish 0.6 0.0049
VP  V NP 0.5 VP  V NP
NP  N 0.14
VP  V 0.1 VP  V 0.06
0.105
S  NP VP
VP  V @VP_V 0.3 S  VP 0.006
1 0.00126
VP  V PP 0.1 N  people 0.5 NP  NP NP
@VP_V  NP PP 1.0 V  people 0.1 0.0049
NP  N 0.35 VP  V NP
NP  NP NP 0.1 0.007
NP  NP PP 0.2 VP  V 0.01 S  NP VP
2 S  VP 0.001
NP  N 0.7 0.0189

PP  P NP 1.0 N  fish 0.2 NP  NP NP


V  fish 0.6 0.00196
NP  N 0.14 VP  V NP
N  people 0.5 //handle unaries VP  V 0.06
0.042
S  NP VP
N  fish 0.2 boolean added = true S  VP 0.006
3 while added
0.00378
N  tanks added = false N  tanks 0.2
0.2 for A, B in nonterms V  tanks 0.1
prob = P(A->B)*score[begin][end][B];
N  rods 0.1 if prob > score[begin][end][A] NP  N 0.14
V  people 0.1 score[begin][end][A] = prob VP  V 0.03
back[begin][end][A] = B S  VP 0.003
V  fish 0.6 4 added = true
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2 NP  NP NP
V  fish 0.6 0.0049
VP  V NP 0.5 VP  V NP
NP  N 0.14
VP  V 0.1 VP  V 0.06
0.105
S  VP
VP  V @VP_V 0.3 S  VP 0.006
1 0.0105
VP  V PP 0.1 N  people 0.5 NP  NP NP
@VP_V  NP PP 1.0 V  people 0.1 0.0049
NP  N 0.35 VP  V NP
NP  NP NP 0.1 0.007
NP  NP PP 0.2 VP  V 0.01 S  NP VP
2 S  VP 0.001
NP  N 0.7 0.0189

PP  P NP 1.0 N  fish 0.2 NP  NP NP


V  fish 0.6 0.00196
NP  N 0.14 VP  V NP
N  people 0.5 VP  V 0.06
0.042
S  VP
N  fish 0.2 S  VP 0.006
3 0.0042
N  tanks N  tanks 0.2
0.2 for split = begin+1 to end-1 V  tanks 0.1
for A,B,C in nonterms
N  rods 0.1 prob=score[begin][split][B]*score[split][end][C]*P(A->BC) NP  N 0.14
V  people 0.1 if prob > score[begin][end][A] VP  V 0.03
V  fish 0.6 4
score[begin]end][A] = prob S  VP 0.003
back[begin][end][A] = new Triple(split,B,C)
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2 NP  NP NP NP  NP NP
V  fish 0.6 0.0049 0.0000686
VP  V NP 0.5 VP  V NP VP  V NP
NP  N 0.14
VP  V 0.1 VP  V 0.06
0.105 0.00147
S  VP S  NP VP
VP  V @VP_V 0.3 S  VP 0.006
1 0.0105 0.000882
VP  V PP 0.1 N  people 0.5 NP  NP NP
@VP_V  NP PP 1.0 V  people 0.1 0.0049
NP  N 0.35 VP  V NP
NP  NP NP 0.1 0.007
NP  NP PP 0.2 VP  V 0.01 S  NP VP
2 S  VP 0.001
NP  N 0.7 0.0189

PP  P NP 1.0 N  fish 0.2 NP  NP NP


V  fish 0.6 0.00196
NP  N 0.14 VP  V NP
N  people 0.5 VP  V 0.06
0.042
S  VP
N  fish 0.2 S  VP 0.006
3 0.0042
N  tanks N  tanks 0.2
0.2 for split = begin+1 to end-1 V  tanks 0.1
for A,B,C in nonterms
N  rods 0.1 prob=score[begin][split][B]*score[split][end][C]*P(A->BC) NP  N 0.14
V  people 0.1 if prob > score[begin][end][A] VP  V 0.03
V  fish 0.6 4
score[begin]end][A] = prob S  VP 0.003
back[begin][end][A] = new Triple(split,B,C)
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2 NP  NP NP NP  NP NP
V  fish 0.6 0.0049 0.0000686
VP  V NP 0.5 VP  V NP VP  V NP
NP  N 0.14
VP  V 0.1 VP  V 0.06
0.105 0.00147
S  VP S  NP VP
VP  V @VP_V 0.3 S  VP 0.006
1 0.0105 0.000882
VP  V PP 0.1 N  people 0.5 NP  NP NP NP  NP NP
@VP_V  NP PP 1.0 V  people 0.1 0.0049 0.0000686
NP  N 0.35 VP  V NP VP  V NP
NP  NP NP 0.1 0.007 0.000098
NP  NP PP 0.2 VP  V 0.01 S  NP VP S  NP VP
2 S  VP 0.001
NP  N 0.7 0.0189 0.01323

PP  P NP 1.0 N  fish 0.2 NP  NP NP


V  fish 0.6 0.00196
NP  N 0.14 VP  V NP
N  people 0.5 VP  V 0.06
0.042
S  VP
N  fish 0.2 S  VP 0.006
3 0.0042
N  tanks N  tanks 0.2
0.2 for split = begin+1 to end-1 V  tanks 0.1
for A,B,C in nonterms
N  rods 0.1 prob=score[begin][split][B]*score[split][end][C]*P(A->BC) NP  N 0.14
V  people 0.1 if prob > score[begin][end][A] VP  V 0.03
V  fish 0.6 4
score[begin]end][A] = prob S  VP 0.003
back[begin][end][A] = new Triple(split,B,C)
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2 NP  NP NP NP  NP NP NP  NP NP
V  fish 0.6 0.0049 0.0000686 0.0000009604
VP  V NP 0.5 VP  V NP VP  V NP VP  V NP
NP  N 0.14
VP  V 0.1 VP  V 0.06
0.105 0.00147 0.00002058
S  VP S  NP VP S  NP VP
VP  V @VP_V 0.3 S  VP 0.006
1 0.0105 0.000882 0.00018522
VP  V PP 0.1 N  people 0.5 NP  NP NP NP  NP NP
@VP_V  NP PP 1.0 V  people 0.1 0.0049 0.0000686
NP  N 0.35 VP  V NP VP  V NP
NP  NP NP 0.1 0.007 0.000098
NP  NP PP 0.2 VP  V 0.01 S  NP VP S  NP VP
2 S  VP 0.001
NP  N 0.7 0.0189 0.01323

PP  P NP 1.0 N  fish 0.2 NP  NP NP


V  fish 0.6 0.00196
NP  N 0.14 VP  V NP
N  people 0.5 VP  V 0.06
0.042
S  VP
N  fish 0.2 S  VP 0.006
3 0.0042
N  tanks N  tanks 0.2
0.2 V  tanks 0.1
N  rods 0.1 NP  N 0.14
V  people 0.1 VP  V 0.03
V  fish 0.6 4 S  VP 0.003
V  tanks 0.3 Call buildTree(score, back) to get the best parse
Evaluating constituency parsing
Evaluating constituency parsing

Gold standard brackets:


S-(0:11), NP-(0:2), VP-(2:9), VP-(3:9), NP-(4:6), PP-(6-9), NP-(7,9), NP-(9:10)
Candidate brackets:
S-(0:11), NP-(0:2), VP-(2:10), VP-(3:10), NP-(4:6), PP-(6-10), NP-(7,10)

Labeled Precision 3/7 = 42.9%


Labeled Recall 3/8 = 37.5%
LP/LR F1 40.0%
Tagging Accuracy 11/11 = 100.0%
How good are PCFGs?

• Penn WSJ parsing accuracy: about 73.7% LP/LR F1


• Robust
• Usually admit everything, but with low probability
• Partial solution for grammar ambiguity
• A PCFG gives some idea of the plausibility of a parse
• But not so good because the independence assumptions are
too strong
• Give a probabilistic language model
• But in the simple case it performs worse than a trigram model
• The problem seems to be that PCFGs lack the
lexicalization of a trigram model
Weaknesses of PCFGs

89
Weaknesses

• Lack of sensitivity to structural frequencies

• Lack of sensitivity to lexical information

• (A word is independent of the rest of the tree given its POS!)

90
A Case of PP Attachment Ambiguity

91
92
A Case of Coordination Ambiguity

93
94
Structural Preferences: Close Attachment

95
Structural Preferences: Close Attachment

• Example: John was believed to have been shot by Bill

• Low attachment analysis (Bill does the shooting) contains same


rules as high attachment analysis (Bill does the believing)
• Two analyses receive the same probability

96
PCFGs and Independence

• The symbols in a PCFG define independence assumptions:


S
S  NP VP NP
NP VP
NP  DT NN

• At any node, the material inside that node is independent of the


material outside that node, given the label of that node
• Any information that statistically connects behavior inside and
outside a node must flow through that node’s label
Non-Independence I

• The independence assumptions of a PCFG are often too strong

All NPs NPs under S NPs under VP


23%
21%

11%
9% 9% 9%
6% 7%
4%

NP PP DT NN PRP NP PP DT NN PRP NP PP DT NN PRP

• Example: the expansion of an NP is highly dependent on the


parent of the NP (i.e., subjects vs. objects)
Non-Independence II
• Symptoms of overly strong assumptions:
• Rewrites get used where they don’t belong

In the PTB, this


construction is
for possessives
Refining the Grammar Symbols

• We can relax independence assumptions by encoding


dependencies into the PCFG symbols, by state splitting:

Parent annotation Marking


[Johnson 98] possessive NPs

• Too much state-splitting  sparseness (no smoothing used!)


• What are the most useful features to encode?
Linguistics in Unlexicalized Parsing

101
Horizontal Markovization

• Horizontal Markovization: Merges States

74% 12000

73% Symbols 9000

72% 6000
71% 3000
70% 0
0 1 2v 2 inf 0 1 2v 2 inf
Horizontal Markov Order Horizontal Markov Order
Vertical Markovization

• Vertical Markov order: Order 1 Order 2


rewrites depend on past
k ancestor nodes.
(i.e., parent annotation)

79% 25000
78% 20000
77%
Symbols

76% 15000
75% 10000
74%
73% 5000
72% 0
1 2v 2 3v 3 1 2v 2 3v 3
Vertical Markov Order Vertical Markov Order Model F1 Size
v=h=2v 77.8 7.5K
Unary Splits

• Problem: unary
rewrites are used to
transmute
categories so a high-
probability rule can
be used.

 Solution: Mark
unary rewrite sites
Annotation F1 Size
with -U
Base 77.8 7.5K
UNARY 78.3 8.0K
Tag Splits

• Problem: Treebank tags are


too coarse.

• Example: SBAR sentential


complementizers (that,
whether, if), subordinating
conjunctions (while, after),
and true prepositions (in, of,
to) are all tagged IN.

Annotation F1 Size
• Partial Solution:
Previous 78.3 8.0K
• Subdivide the IN tag.
SPLIT-IN 80.3 8.1K
Other Tag Splits
F1 Size
• UNARY-DT: mark demonstratives as DT^U (“the
X” vs. “those”) 80.4 8.1K
• UNARY-RB: mark phrasal adverbs as RB^U
80.5 8.1K
(“quickly” vs. “very”)
• TAG-PA: mark tags with non-canonical parents
81.2 8.5K
(“not” is an RB^VP)
• SPLIT-AUX: mark auxiliary verbs with –AUX [cf. 81.6 9.0K
Charniak 97]
• SPLIT-CC: separate “but” and “&” from other 81.7 9.1K
conjunctions
• SPLIT-%: “%” gets its own tag. 81.8 9.3K
Yield Splits

• Problem: sometimes the behavior


of a category depends on
something inside its future yield.

• Examples:
• Possessive NPs
• Finite vs. infinite VPs
• Lexical heads!

• Solution: annotate future Annotation F1 Size


elements into nodes.
tag splits 82.3 9.7K
POSS-NP 83.1 9.8K
SPLIT-VP 85.7 10.5K
Distance / Recursion Splits

• Problem: vanilla PCFGs cannot NP -v


distinguish attachment
heights. VP
NP
• Solution: mark a property of
higher or lower sites: PP
• Contains a verb.
v
• Is (non)-recursive. Annotation F1 Size
• Base NPs [cf. Collins 99]
Previous 85.7 10.5K
• Right-recursive NPs
BASE-NP 86.0 11.7K
DOMINATES-V 86.9 14.1K
RIGHT-REC-NP 87.0 15.2K
A Fully Annotated Tree
Final Test Set Results

Parser LP LR F1
Magerman 95 84.9 84.6 84.7
Collins 96 86.3 85.8 86.0
Klein & Manning 03 86.9 85.7 86.3
Charniak 97 87.4 87.5 87.4
Collins 99 88.7 88.6 88.6

• Beats “first generation” lexicalized parsers


Lexicalised PCFGs

111
Heads in Context Free Rules

112
Heads

113
Rules to Recover Heads: An Example for NPs

114
Rules to Recover Heads: An Example for VPs

115
Adding Headwords to Trees

116
Adding Headwords to Trees

117
Adding Headwords to Trees

118
Lexicalized CFGs in Chomsky Normal Form

119
Example

120
Lexicalized CKY
(VP->VBD...NP)[saw]
X[h]
(VP-> VBD[saw] NP[her])

Y[h] Z[h’]

bestScore(X,i,j,h)
if (j = i)
return score(X,s[i]) i h k h’ j
else
return
max score(X[h]->Y[h]Z[w]) *
k,h,w
bestScore(Y,i,k,h) *
X->YZ
bestScore(Z,k+1,j,w)
max score(X[h]->Y[w]Z[h]) *
k,h,w
bestScore(Y,i,k,w) *
X->YZ
bestScore(Z,k+1,j,h)
Parsing with Lexicalized CFGs

122
Pruning with Beams
X[h]
• The Collins parser prunes with
per-cell beams [Collins 99] Y[h] Z[h’]
• Essentially, run the O(n5) CKY
• Remember only a few hypotheses for
each span <i,j>.
• If we keep K hypotheses at each i h k h’ j
span, then we do at most O(nK2)
work per span (why?)
• Keeps things more or less cubic

• Also: certain spans are forbidden


entirely on the basis of
punctuation (crucial for speed)
Parameter Estimation

124
A Model from Charniak (1997)

125
A Model from Charniak (1997)

126
Final Test Set Results

Parser LP LR F1
Magerman 95 84.9 84.6 84.7
Collins 96 86.3 85.8 86.0
Klein & Manning 03 86.9 85.7 86.3
Charniak 97 87.4 87.5 87.4
Collins 99 88.7 88.6 88.6
Strengths and Weaknesses of PCFG Parsers

131

You might also like