Open navigation menu

Scribd

0% found this document useful (0 votes)

13 views122 pages

19 Parsing

Uploaded by

shreyanshdwivedi704

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views122 pages

19 Parsing

Uploaded by

shreyanshdwivedi704

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 122

Statistical Natural Language Parsing

Mausam

(Based on slides of Michael Collins, Dan Jurafsky, Dan Klein,

Chris Manning, Ray Mooney, Luke Zettlemoyer)
Two views of linguistic structure:
1. Constituency (phrase structure)

• Phrase structure organizes words into nested constituents.

• How do we know what is a constituent? (Not that linguists don’t
argue about some cases.)
• Distribution: a constituent behaves as a unit that can appear in different
places:
• John talked [to the children] [about drugs].
• John talked [about drugs] [to the children].
• *John talked drugs to the children about
• Substitution/expansion/pro-forms:
• I sat [on the box/right on top of the box/there].
• Coordination, regular internal structure, no intrusion,
fragments, semantics, …
Two views of linguistic structure:
2. Dependency structure

• Dependency structure shows which words depend on (modify or

are arguments of) which other words.

put

boy tortoise on
The the rug
The boy put the tortoise on the rug
the
Why Parse?
• Part of speech information
• Phrase information
• Useful relationships

8
The rise of annotated data:
The Penn Treebank
[Marcus et al. 1993, Computational Linguistics]
( (S
(NP-SBJ (DT The) (NN move))
(VP (VBD followed)
(NP
(NP (DT a) (NN round))
(PP (IN of)
(NP
(NP (JJ similar) (NNS increases))
(PP (IN by)
(NP (JJ other) (NNS lenders)))
(PP (IN against)
(NP (NNP Arizona) (JJ real) (NN estate) (NNS loans))))))
(, ,)
(S-ADV
(NP-SBJ (-NONE- *))
(VP (VBG reflecting)
(NP
(NP (DT a) (VBG continuing) (NN decline))
(PP-LOC (IN in)
(NP (DT that) (NN market)))))))
(. .)))
Penn Treebank Non-terminals
Statistical parsing applications

Statistical parsers are now robust and widely used in larger NLP
applications:
• High precision question answering [Pasca and Harabagiu SIGIR 2001]
• Improving biological named entity finding [Finkel et al. JNLPBA 2004]
• Syntactically based sentence compression [Lin and Wilbur 2007]
• Extracting opinions about products [Bloom et al. NAACL 2007]
• Improved interaction in computer games [Gorniak and Roy 2005]
• Helping linguists find data [Resnik et al. BLS 2005]
• Source sentence analysis for machine translation [Xu et al. 2009]
• Relation extraction systems [Fundel et al. Bioinformatics 2006]
Example Application: Machine Translation

• The boy put the tortoise on the rug

• लड़के ने रखा कछु आ ऊपर कालीन

• SVO vs. SOV; preposition vs. post-position
S S

NP VP PP NP VP PP

DT NN V NP IN NP DT NN V NP IN NP

on on
the boy the boy
put DT NN DT NN put DT NN DT NN
the tortoise the rug the tortoise the rug
Example Application: Machine Translation

• The boy put the tortoise on the rug

• लड़के ने रखा कछु आ ऊपर कालीन

• SVO vs. SOV; preposition vs. post-position
S S

NP VP PP NP PP VP

DT NN V NP IN NP DT NN IN NP V NP

on
the boy the boy on
put DT NN DT NN DT NN put DT NN
the tortoise the rug the tortoise
the rug
Example Application: Machine Translation

• The boy put the tortoise on the rug

• लड़के ने रखा कछु आ ऊपर कालीन

• SVO vs. SOV; preposition vs. post-position
S S

NP VP PP NP PP VP

DT NN V NP IN NP DT NN NP IN V NP

on
the boy the boy on
put DT NN DT NN DT NN put DT NN
the tortoise the rug the tortoise
the rug
Example Application: Machine Translation

• The boy put the tortoise on the rug

• लड़के ने रखा कछु आ ऊपर कालीन

• SVO vs. SOV; preposition vs. post-position
S S

NP VP PP NP PP VP

DT NN V NP IN NP DT NN NP IN NP V
on
the boy the boy on
put DT NN DT NN DT NN DT NN put
the tortoise the rug the tortoise
the rug
Example Application: Machine Translation

• The boy put the tortoise on the rug

• लड़के ने रखा कछु आ ऊपर कालीन

• SVO vs. SOV; preposition vs. post-position
S S

NP VP PP NP PP VP

DT NN V NP IN NP DT NN NP IN NP V
on
the boy
लड़के ने DT ऊपर
put DT NN DT NN NN DT NN रखा
the tortoise the rug कछु आ
कालीन
Pre 1990 (“Classical”) NLP Parsing
• Goes back to Chomsky’s PhD thesis in 1950s
• Wrote symbolic grammar (CFG or often richer) and lexicon
S  NP VP NN  interest
NP  (DT) NN NNS  rates
NP  NN NNS NNS  raises
NP  NNP VBP  interest
VP  V NP VBZ  rates

• Used grammar/proof systems to prove parses from words

• This scaled very badly and didn’t give coverage. For sentence:
Fed raises interest rates 0.5% in effort to control inflation
• Minimal grammar: 36 parses
• Simple 10 rule grammar: 592 parses
• Real-size broad-coverage grammar: millions of parses
Classical NLP Parsing:
The problem and its solution

• Categorical constraints can be added to grammars to limit

unlikely/weird parses for sentences
• But the attempt make the grammars not robust
• In traditional systems, commonly 30% of sentences in even an edited
text would have no parse.
• A less constrained grammar can parse more sentences
• But simple sentences end up with ever more parses with no way to
choose between them
• We need mechanisms that allow us to find the most likely
parse(s) for a sentence
• Statistical parsing lets us work with very loose grammars that admit
millions of parses for sentences but still quickly find the best parse(s)
Context Free Grammars and Ambiguities

20
Context-Free Grammars

21
Context-Free Grammars in NLP

• A context free grammar G in NLP = (N, C, Σ, S, L, R)

• Σ is a set of terminal symbols
• C is a set of preterminal symbols
• N is a set of nonterminal symbols
• S is the start symbol (S ∈ N)
• L is the lexicon, a set of items of the form X  x
• X ∈ C and x ∈ Σ
• R is the grammar, a set of items of the form X  
• X ∈ N and  ∈ (N ∪ C)*
• By usual convention, S is the start symbol, but in statistical NLP,
we usually have an extra node at the top (ROOT, TOP)
• We usually write e for an empty sequence, rather than nothing
22
A Context Free Grammar of English

23
Left-Most Derivations

24
Properties of CFGs

25
A Fragment of a Noun Phrase Grammar

26
Extended Grammar with Prepositional Phrases

27
Verbs, Verb Phrases and Sentences

28
PPs Modifying Verb Phrases

29
Complementizers and SBARs

30
More Verbs

31
Coordination

32
Much more remains…

33
Attachment ambiguities

• A key parsing decision is how we ‘attach’ various constituents

• PPs, adverbial or participial phrases, infinitives, coordinations, etc.
Attachment ambiguities

• A key parsing decision is how we ‘attach’ various constituents

• PPs, adverbial or participial phrases, infinitives, coordinations, etc.

• Catalan numbers: Cn = (2n)!/[(n+1)!n!]

• An exponentially growing series, which arises in many tree-like contexts:
• E.g., the number of possible triangulations of a polygon with n+2 sides
• Turns up in triangulation of probabilistic graphical models….
Attachments

• I cleaned the dishes from dinner

• I cleaned the dishes with detergent

• I cleaned the dishes in my pajamas

• I cleaned the dishes in the sink

Syntactic Ambiguities I
• Prepositional phrases:
They cooked the beans in the pot on the stove with
handles.
• Particle vs. preposition:
The lady dressed up the staircase.
• Complement structures
The tourists objected to the guide that they couldn’t hear.
She knows you like the back of her hand.
• Gerund vs. participial adjective
Visiting relatives can be boring.
Changing schedules frequently confused passengers.
Syntactic Ambiguities II

• Modifier scope within NPs

impractical design requirements
plastic cup holder
• Multiple gap constructions
The chicken is ready to eat.
The contractors are rich enough to sue.
• Coordination scope:
Small rats and mice can squeeze into holes or cracks in
the wall.
Non-Local Phenomena

• Dislocation / gapping
• Which book should Peter buy?
• A debate arose which continued until the election.

• Binding
• Reference
• The IRS audits itself
• Control
• I want to go
• I want you to go
40
Context-Free Grammars in NLP

• A context free grammar G in NLP = (N, C, Σ, S, L, R)

• Σ is a set of terminal symbols
• C is a set of preterminal symbols
• N is a set of nonterminal symbols
• S is the start symbol (S ∈ N)
• L is the lexicon, a set of items of the form X  x
• X ∈ C and x ∈ Σ
• R is the grammar, a set of items of the form X  
• X ∈ N and  ∈ (N ∪ C)*
• By usual convention, S is the start symbol, but in statistical NLP,
we usually have an extra node at the top (ROOT, TOP)
• We usually write e for an empty sequence, rather than nothing
42
Parsing: Two problems to solve:
1. Repeated work…
Parsing: Two problems to solve:
1. Repeated work…
Parsing: Two problems to solve:
2. Choosing the correct parse

• How do we work out the correct attachment:

• She saw the man with a telescope

• Is the problem ‘AI complete’? Yes, but …
• Words are good predictors of attachment
• Even absent full understanding

• Moscow sent more than 100,000 soldiers into Afghanistan …

• Sydney Water breached an agreement with NSW Health …

• Our statistical parsers will try to exploit such statistics.

Probabilistic Context Free Grammar

46
Probabilistic – or stochastic – context-free
grammars (PCFGs)

• G = (Σ, N, S, R, P)
• T is a set of terminal symbols
• N is a set of nonterminal symbols
• S is the start symbol (S ∈ N)
• R is a set of rules/productions of the form X  
• P is a probability function
• P: R  [0,1]
•

• A grammar G generates a language model L.

åg ÎT *
P(g ) = 1
A Probabilistic Context-Free Grammar (PCFG)
PCFG Example
Vi ⇒ sleeps 1.0
S ⇒ NP VP 1.0
Vt ⇒ saw 1.0
VP ⇒ Vi 0.4
NN ⇒ man 0.7
VP ⇒ Vt NP 0.4
NN ⇒ woman 0.2
VP ⇒ VP PP 0.2
NN ⇒ telescope 0.1
NP ⇒ DT NN 0.3
DT ⇒ the 1.0
NP ⇒ NP PP 0.7
IN ⇒ with 0.5
PP ⇒ P NP 1.0
IN ⇒ in 0.5
• Probability of a tree t with rules
α 1 → β1, α 2 → β2, . . . , α n → βn
is n
p(t) = q(α i → βi )
i= 1
where q(α → β) is the probability for rule α → β.
Example of a PCFG

49
A Probabilistic Context-Free Grammar (PCFG)
Probability of a Parse
Vi ⇒ sleeps 1.0
S ⇒ NP VP 1.0 S1.0
Vt ⇒ saw 1.0
VP ⇒ Vi 0.4
VP ⇒ Vt NP 0.4 tNN
1= ⇒NP0.3manVP
0.4
0.7
NN DT ⇒ NNwoman 0.2
VP ⇒ VP PP 0.2 Vi
NN ⇒ telescope
1.0 0.7 1.0
0.1
NP ⇒ DT NN 0.3 The man sleeps
Free DT ⇒ the 1.0
NPGrammar
⇒ NP PP(PCFG)
0.7 p(t1)=1.0*0.3*1.0*0.7*0.4*1.0
IN ⇒ with 0.5
PP ⇒ P NP 1.0
IN ⇒ inS1.0 0.5
•ViProbability
⇒ sleeps of a tree t1.0
with rules VP 0.2
Vt ⇒ saw 1.0
α 1 → β1, α 2 →t2β=2, . . . , α n → VPβn
NN ⇒ man 0.7 0.4
PP0.4
NNis ⇒ woman 0.2 n NP0.3 Vt NP0.3 IN NP0.3
NN ⇒ telescope p(t) 0.1 = q(α → β
1.0
)
0.5
DT i NN i DT NN DT NN
DT ⇒ the 1.0 i = 1 1.0 0.7 1.0 0.2 1.0 0.1
INwhere
⇒ q(αwith The man saw the woman with the telescope
0.5probability
→ β) is the for rule α → β.
IN ⇒ in 0.5 p(t )=1.8*0.3*1.0*0.7*0.2*0.4*1.0*0.3*1.0*0.2*0.4*0.5*0.3*1.0*0.1
s

es
PCFGs: Learning and Inference
 Model
 The probability of a tree t with n rules αi  βi, i = 1..n

 Learning
 Read the rules off of labeled sentences, use ML estimates for
probabilities

 and use all of our standard smoothing tricks!

 Inference
 For input sentence s, define T(s) to be the set of trees whose yield is s
(whole leaves, read left to right, match the words in s)
Grammar Transforms

52
Chomsky Normal Form

• All rules are of the form X  Y Z or X  w

• X, Y, Z ∈ N and w ∈ Σ
• A transformation to this form doesn’t change the weak
generative capacity of a CFG
• That is, it recognizes the same language
• But maybe with different trees
• Empties and unaries are removed recursively
• n-ary rules are divided by introducing new nonterminals (n > 2)
A phrase structure grammar

S  NP VP N  people
VP  V NP N  fish
VP  V NP PP N  tanks
NP  NP NP N  rods
NP  NP PP V  people
NP  N V  fish
NP  e V  tanks
PP  P NP P  with
Chomsky Normal Form steps

S  NP VP N  people
S  VP N  fish
VP  V NP
N  tanks
VP  V
VP  V NP PP N  rods
VP  V PP V  people
NP  NP NP V  fish
NP  NP V  tanks
NP  NP PP
P  with
NP  PP
NP  N
PP  P NP
PP  P
Chomsky Normal Form steps

S  NP VP N  people
VP  V NP
S  V NP N  fish
VP  V N  tanks
SV
N  rods
VP  V NP PP
S  V NP PP V  people
VP  V PP V  fish
S  V PP
NP  NP NP V  tanks
NP  NP P  with
NP  NP PP
NP  PP
NP  N
PP  P NP
PP  P
Chomsky Normal Form steps

S  NP VP N  people
VP  V NP
S  V NP N  fish
VP  V N  tanks
VP  V NP PP
N  rods
S  V NP PP
VP  V PP V  people
S  V PP S  people
NP  NP NP
NP  NP V  fish
NP  NP PP S  fish
NP  PP
NP  N
V  tanks
PP  P NP S  tanks
PP  P
P  with
Chomsky Normal Form steps

S  NP VP N  people
VP  V NP N  fish
S  V NP N  tanks
VP  V NP PP N  rods
S  V NP PP V  people
VP  V PP S  people
S  V PP
VP  people
NP  NP NP
V  fish
NP  NP
S  fish
NP  NP PP
VP  fish
NP  PP
V  tanks
NP  N
PP  P NP S  tanks
PP  P VP  tanks
P  with
Chomsky Normal Form steps

S  NP VP NP  people
VP  V NP NP  fish
S  V NP NP  tanks
VP  V NP PP NP  rods
S  V NP PP V  people
VP  V PP S  people
S  V PP VP  people
NP  NP NP V  fish
NP  NP PP S  fish
NP  P NP VP  fish
PP  P NP V  tanks
S  tanks
VP  tanks
P  with
PP  with
Chomsky Normal Form steps

S  NP VP NP  people
VP  V NP NP  fish
S  V NP NP  tanks
VP  V @VP_V NP  rods
@VP_V  NP PP V  people
S  V @S_V S  people
@S_V  NP PP VP  people
VP  V PP V  fish
S  V PP S  fish
NP  NP NP VP  fish
NP  NP PP V  tanks
NP  P NP S  tanks
PP  P NP VP  tanks
P  with
PP  with
A phrase structure grammar

S  NP VP N  people
VP  V NP N  fish
VP  V NP PP N  tanks
NP  NP NP N  rods
NP  NP PP V  people
NP  N V  fish
NP  e V  tanks
PP  P NP P  with
Chomsky Normal Form steps

S  NP VP NP  people
VP  V NP NP  fish
S  V NP NP  tanks
VP  V @VP_V NP  rods
@VP_V  NP PP V  people
S  V @S_V S  people
@S_V  NP PP VP  people
VP  V PP V  fish
S  V PP S  fish
NP  NP NP VP  fish
NP  NP PP V  tanks
NP  P NP S  tanks
PP  P NP VP  tanks
P  with
PP  with
Chomsky Normal Form

• You should think of this as a transformation for efficient parsing

• With some extra book-keeping in symbol names, you can even
reconstruct the same trees with a detransform
• In practice full Chomsky Normal Form is a pain
• Reconstructing n-aries is easy
• Reconstructing unaries/empties is trickier

• Binarization is crucial for cubic time CFG parsing

• The rest isn’t necessary; it just makes the algorithms cleaner and
a bit quicker
An example: before binarization…

ROOT

NP VP

V NP PP
N
P NP
N
N

people fish tanks with rods

After binarization…

ROOT

NP VP

@VP_V

V NP PP
N

N P NP

people fish tanks with rods

Parsing

67
Constituency Parsing

PCFG
Rule Prob θi

S S  NP VP θ0
NP  NP NP θ1
VP
…
NP NP N  fish θ42
N  people θ43
N N V N
V  fish θ44
fish people fish tanks …
Cocke-Kasami-Younger (CKY)
Constituency Parsing (Parse Triangle/Chart)

fish people fish tanks

Viterbi (Max) Scores

S  NP VP 0.9
S  VP 0.1
VP  V NP 0.5
VP  V 0.1
VP 0.06 VP  V @VP_V 0.3
NP 0.35
NP 0.14 VP  V PP
V 0.1 0.1
V 0.6 @VP_V  NP PP 1.0
N 0.5
N 0.2 NP  NP NP 0.1
NP  NP PP 0.2
NP  N 0.7
people fish PP  P NP 1.0
Extended CKY parsing

• Unaries can be incorporated into the algorithm

• Messy, but doesn’t increase algorithmic complexity
• Empties can be incorporated
• Use fenceposts
• Doesn’t increase complexity; essentially like unaries
Extended CKY parsing

• Unaries can be incorporated into the algorithm

• Messy, but doesn’t increase algorithmic complexity
• Empties can be incorporated
• Use fenceposts
• Doesn’t increase complexity; essentially like unaries

• Binarization is vital
• Without binarization, you don’t get parsing cubic in the length of the
sentence and in the number of nonterminals in the grammar
• Binarization may be an explicit transformation or implicit in how the parser
works (Earley-style dotted rules), but it’s always there.
A Recursive Parser
bestScore(X,i,j,s)
if (j == i)
return q(X->s[i])
else
return max q(X->YZ) *
k,X->YZ
bestScore(Y,i,k,s) *
bestScore(Z,k+1,j,s)
The CKY algorithm (1960/1965)
… extended to unaries

function CKY(words, grammar) returns [most_probable_parse,prob]

score = new double[#(words)+1][#(words)+1][#(nonterms)]
back = new Pair[#(words)+1][#(words)+1][#nonterms]]
//LEXICON
for i=0; i<#(words); i++
for A in nonterms
if A -> words[i] in grammar
score[i][i+1][A] = P(A -> words[i])
//handle unaries
boolean added = true
while added
added = false
for A, B in nonterms
if score[i][i+1][B] > 0 && A->B in grammar
prob = P(A->B)*score[i][i+1][B]
if prob > score[i][i+1][A]
score[i][i+1][A] = prob
back[i][i+1][A] = B
added = true
The CKY algorithm (1960/1965)
… extended to unaries
//build higher order cells
for span = 2 to #(words)
for begin = 0 to #(words)- span
end = begin + span
for split = begin+1 to end-1
for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)
if prob > score[begin][end][A]
score[begin]end][A] = prob
back[begin][end][A] = new Triple(split,B,C)
//handle unaries
boolean added = true
while added
added = false
for A, B in nonterms
prob = P(A->B)*score[begin][end][B];
if prob > score[begin][end][A]
score[begin][end][A] = prob
back[begin][end][A] = B
added = true
return buildTree(score, back)
The grammar:
Binary, no epsilons,

S  NP VP 0.9 N  people 0.5

S  VP 0.1 N  fish 0.2
VP  V NP 0.5
N  tanks 0.2
VP  V 0.1
VP  V @VP_V 0.3 N  rods 0.1
VP  V PP 0.1 V  people 0.1
@VP_V  NP PP 1.0 V  fish 0.6
NP  NP NP 0.1 V  tanks 0.3
NP  NP PP 0.2
P  with 1.0
NP  N 0.7
PP  P NP 1.0
fish 1 people 2 fish 3 tanks 4
0

score[0][1] score[0][2] score[0][3] score[0][4]

score[1][2] score[1][3] score[1][4]

score[2][3] score[2][4]

score[3][4]

4
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1
VP  V NP 0.5
VP  V 0.1
VP  V @VP_V 0.3
1
VP  V PP 0.1
@VP_V  NP PP 1.0
NP  NP NP 0.1
NP  NP PP 0.2
NP  N 0.7 2
PP  P NP 1.0

N  people 0.5
N  fish 0.2
3
N  tanks
for i=0; i<#(words); i++
0.2
for A in nonterms
N  rods 0.1 if A -> words[i] in grammar
V  people 0.1 score[i][i+1][A] = P(A -> words[i]);
V  fish 0.6 4
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2
V  fish 0.6
VP  V NP 0.5
VP  V 0.1
VP  V @VP_V 0.3
1
VP  V PP 0.1 N  people 0.5
@VP_V  NP PP 1.0 V  people 0.1
NP  NP NP 0.1
NP  NP PP 0.2
NP  N 0.7 2
PP  P NP 1.0 N  fish 0.2
V  fish 0.6

N  people 0.5
N  fish 0.2 // handle unaries
3
boolean added = true
N  tanks while added N  tanks 0.2
0.2 added = false V  tanks 0.1
for A, B in nonterms
N  rods 0.1 if score[i][i+1][B] > 0 && A->B in grammar
V  people 0.1 prob = P(A->B)*score[i][i+1][B]
if(prob > score[i][i+1][A])
V  fish 0.6 4 score[i][i+1][A] = prob
back[i][i+1][A] = B
V  tanks 0.3
added = true
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2
V  fish 0.6
VP  V NP 0.5
NP  N 0.14
VP  V 0.1 VP  V 0.06
VP  V @VP_V 0.3 S  VP 0.006
1
VP  V PP 0.1 N  people 0.5
@VP_V  NP PP 1.0 V  people 0.1
NP  NP NP 0.1 NP  N 0.35
NP  NP PP 0.2 VP  V 0.01
2 S  VP 0.001
NP  N 0.7
PP  P NP 1.0 N  fish 0.2
V  fish 0.6
NP  N 0.14
N  people 0.5 VP  V 0.06
N  fish 0.2 S  VP 0.006
3
N  tanks N  tanks 0.2
0.2 prob=score[begin][split][B]*score[split][end][C]*P(A->BC) V  tanks 0.1
if (prob > score[begin][end][A])
N  rods 0.1 score[begin]end][A] = prob NP  N 0.14
V  people 0.1 back[begin][end][A] = new Triple(split,B,C) VP  V 0.03
V  fish 0.6 4 S  VP 0.003
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2 NP  NP NP
V  fish 0.6 0.0049
VP  V NP 0.5 VP  V NP
NP  N 0.14
VP  V 0.1 VP  V 0.06
0.105
S  NP VP
VP  V @VP_V 0.3 S  VP 0.006
1 0.00126
VP  V PP 0.1 N  people 0.5 NP  NP NP
@VP_V  NP PP 1.0 V  people 0.1 0.0049
NP  N 0.35 VP  V NP
NP  NP NP 0.1 0.007
NP  NP PP 0.2 VP  V 0.01 S  NP VP
2 S  VP 0.001
NP  N 0.7 0.0189

PP  P NP 1.0 N  fish 0.2 NP  NP NP

V  fish 0.6 0.00196
NP  N 0.14 VP  V NP
N  people 0.5 //handle unaries VP  V 0.06
0.042
S  NP VP
N  fish 0.2 boolean added = true S  VP 0.006
3 while added
0.00378
N  tanks added = false N  tanks 0.2
0.2 for A, B in nonterms V  tanks 0.1
prob = P(A->B)*score[begin][end][B];
N  rods 0.1 if prob > score[begin][end][A] NP  N 0.14
V  people 0.1 score[begin][end][A] = prob VP  V 0.03
back[begin][end][A] = B S  VP 0.003
V  fish 0.6 4 added = true
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2 NP  NP NP
V  fish 0.6 0.0049
VP  V NP 0.5 VP  V NP
NP  N 0.14
VP  V 0.1 VP  V 0.06
0.105
S  VP
VP  V @VP_V 0.3 S  VP 0.006
1 0.0105
VP  V PP 0.1 N  people 0.5 NP  NP NP
@VP_V  NP PP 1.0 V  people 0.1 0.0049
NP  N 0.35 VP  V NP
NP  NP NP 0.1 0.007
NP  NP PP 0.2 VP  V 0.01 S  NP VP
2 S  VP 0.001
NP  N 0.7 0.0189

PP  P NP 1.0 N  fish 0.2 NP  NP NP

V  fish 0.6 0.00196
NP  N 0.14 VP  V NP
N  people 0.5 VP  V 0.06
0.042
S  VP
N  fish 0.2 S  VP 0.006
3 0.0042
N  tanks N  tanks 0.2
0.2 for split = begin+1 to end-1 V  tanks 0.1
for A,B,C in nonterms
N  rods 0.1 prob=score[begin][split][B]*score[split][end][C]*P(A->BC) NP  N 0.14
V  people 0.1 if prob > score[begin][end][A] VP  V 0.03
V  fish 0.6 4
score[begin]end][A] = prob S  VP 0.003
back[begin][end][A] = new Triple(split,B,C)
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2 NP  NP NP NP  NP NP
V  fish 0.6 0.0049 0.0000686
VP  V NP 0.5 VP  V NP VP  V NP
NP  N 0.14
VP  V 0.1 VP  V 0.06
0.105 0.00147
S  VP S  NP VP
VP  V @VP_V 0.3 S  VP 0.006
1 0.0105 0.000882
VP  V PP 0.1 N  people 0.5 NP  NP NP
@VP_V  NP PP 1.0 V  people 0.1 0.0049
NP  N 0.35 VP  V NP
NP  NP NP 0.1 0.007
NP  NP PP 0.2 VP  V 0.01 S  NP VP
2 S  VP 0.001
NP  N 0.7 0.0189

PP  P NP 1.0 N  fish 0.2 NP  NP NP

V  fish 0.6 0.00196
NP  N 0.14 VP  V NP
N  people 0.5 VP  V 0.06
0.042
S  VP
N  fish 0.2 S  VP 0.006
3 0.0042
N  tanks N  tanks 0.2
0.2 for split = begin+1 to end-1 V  tanks 0.1
for A,B,C in nonterms
N  rods 0.1 prob=score[begin][split][B]*score[split][end][C]*P(A->BC) NP  N 0.14
V  people 0.1 if prob > score[begin][end][A] VP  V 0.03
V  fish 0.6 4
score[begin]end][A] = prob S  VP 0.003
back[begin][end][A] = new Triple(split,B,C)
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2 NP  NP NP NP  NP NP
V  fish 0.6 0.0049 0.0000686
VP  V NP 0.5 VP  V NP VP  V NP
NP  N 0.14
VP  V 0.1 VP  V 0.06
0.105 0.00147
S  VP S  NP VP
VP  V @VP_V 0.3 S  VP 0.006
1 0.0105 0.000882
VP  V PP 0.1 N  people 0.5 NP  NP NP NP  NP NP
@VP_V  NP PP 1.0 V  people 0.1 0.0049 0.0000686
NP  N 0.35 VP  V NP VP  V NP
NP  NP NP 0.1 0.007 0.000098
NP  NP PP 0.2 VP  V 0.01 S  NP VP S  NP VP
2 S  VP 0.001
NP  N 0.7 0.0189 0.01323

PP  P NP 1.0 N  fish 0.2 NP  NP NP

V  fish 0.6 0.00196
NP  N 0.14 VP  V NP
N  people 0.5 VP  V 0.06
0.042
S  VP
N  fish 0.2 S  VP 0.006
3 0.0042
N  tanks N  tanks 0.2
0.2 for split = begin+1 to end-1 V  tanks 0.1
for A,B,C in nonterms
N  rods 0.1 prob=score[begin][split][B]*score[split][end][C]*P(A->BC) NP  N 0.14
V  people 0.1 if prob > score[begin][end][A] VP  V 0.03
V  fish 0.6 4
score[begin]end][A] = prob S  VP 0.003
back[begin][end][A] = new Triple(split,B,C)
V  tanks 0.3
fish 1 people 2 fish 3 tanks 4
S  NP VP 0.9 0
S  VP 0.1 N  fish 0.2 NP  NP NP NP  NP NP NP  NP NP
V  fish 0.6 0.0049 0.0000686 0.0000009604
VP  V NP 0.5 VP  V NP VP  V NP VP  V NP
NP  N 0.14
VP  V 0.1 VP  V 0.06
0.105 0.00147 0.00002058
S  VP S  NP VP S  NP VP
VP  V @VP_V 0.3 S  VP 0.006
1 0.0105 0.000882 0.00018522
VP  V PP 0.1 N  people 0.5 NP  NP NP NP  NP NP
@VP_V  NP PP 1.0 V  people 0.1 0.0049 0.0000686
NP  N 0.35 VP  V NP VP  V NP
NP  NP NP 0.1 0.007 0.000098
NP  NP PP 0.2 VP  V 0.01 S  NP VP S  NP VP
2 S  VP 0.001
NP  N 0.7 0.0189 0.01323

PP  P NP 1.0 N  fish 0.2 NP  NP NP

V  fish 0.6 0.00196
NP  N 0.14 VP  V NP
N  people 0.5 VP  V 0.06
0.042
S  VP
N  fish 0.2 S  VP 0.006
3 0.0042
N  tanks N  tanks 0.2
0.2 V  tanks 0.1
N  rods 0.1 NP  N 0.14
V  people 0.1 VP  V 0.03
V  fish 0.6 4 S  VP 0.003
V  tanks 0.3 Call buildTree(score, back) to get the best parse
Evaluating constituency parsing
Evaluating constituency parsing

Gold standard brackets:

S-(0:11), NP-(0:2), VP-(2:9), VP-(3:9), NP-(4:6), PP-(6-9), NP-(7,9), NP-(9:10)
Candidate brackets:
S-(0:11), NP-(0:2), VP-(2:10), VP-(3:10), NP-(4:6), PP-(6-10), NP-(7,10)

Labeled Precision 3/7 = 42.9%

Labeled Recall 3/8 = 37.5%
LP/LR F1 40.0%
Tagging Accuracy 11/11 = 100.0%
How good are PCFGs?

• Penn WSJ parsing accuracy: about 73.7% LP/LR F1

• Robust
• Usually admit everything, but with low probability
• Partial solution for grammar ambiguity
• A PCFG gives some idea of the plausibility of a parse
• But not so good because the independence assumptions are
too strong
• Give a probabilistic language model
• But in the simple case it performs worse than a trigram model
• The problem seems to be that PCFGs lack the
lexicalization of a trigram model
Weaknesses of PCFGs

89
Weaknesses

• Lack of sensitivity to structural frequencies

• Lack of sensitivity to lexical information

• (A word is independent of the rest of the tree given its POS!)

90
A Case of PP Attachment Ambiguity

91
92
A Case of Coordination Ambiguity

93
94
Structural Preferences: Close Attachment

95
Structural Preferences: Close Attachment

• Example: John was believed to have been shot by Bill

• Low attachment analysis (Bill does the shooting) contains same

rules as high attachment analysis (Bill does the believing)
• Two analyses receive the same probability

96
PCFGs and Independence

• The symbols in a PCFG define independence assumptions:

S
S  NP VP NP
NP VP
NP  DT NN

• At any node, the material inside that node is independent of the

material outside that node, given the label of that node
• Any information that statistically connects behavior inside and
outside a node must flow through that node’s label
Non-Independence I

• The independence assumptions of a PCFG are often too strong

All NPs NPs under S NPs under VP

23%
21%

11%
9% 9% 9%
6% 7%
4%

NP PP DT NN PRP NP PP DT NN PRP NP PP DT NN PRP

• Example: the expansion of an NP is highly dependent on the

parent of the NP (i.e., subjects vs. objects)
Non-Independence II
• Symptoms of overly strong assumptions:
• Rewrites get used where they don’t belong

In the PTB, this

construction is
for possessives
Refining the Grammar Symbols

• We can relax independence assumptions by encoding

dependencies into the PCFG symbols, by state splitting:

Parent annotation Marking

[Johnson 98] possessive NPs

• Too much state-splitting  sparseness (no smoothing used!)

• What are the most useful features to encode?
Linguistics in Unlexicalized Parsing

101
Horizontal Markovization

• Horizontal Markovization: Merges States

74% 12000

73% Symbols 9000

72% 6000
71% 3000
70% 0
0 1 2v 2 inf 0 1 2v 2 inf
Horizontal Markov Order Horizontal Markov Order
Vertical Markovization

• Vertical Markov order: Order 1 Order 2

rewrites depend on past
k ancestor nodes.
(i.e., parent annotation)

79% 25000
78% 20000
77%
Symbols

76% 15000
75% 10000
74%
73% 5000
72% 0
1 2v 2 3v 3 1 2v 2 3v 3
Vertical Markov Order Vertical Markov Order Model F1 Size
v=h=2v 77.8 7.5K
Unary Splits

• Problem: unary
rewrites are used to
transmute
categories so a high-
probability rule can
be used.

 Solution: Mark
unary rewrite sites
Annotation F1 Size
with -U
Base 77.8 7.5K
UNARY 78.3 8.0K
Tag Splits

• Problem: Treebank tags are

too coarse.

• Example: SBAR sentential

complementizers (that,
whether, if), subordinating
conjunctions (while, after),
and true prepositions (in, of,
to) are all tagged IN.

Annotation F1 Size
• Partial Solution:
Previous 78.3 8.0K
• Subdivide the IN tag.
SPLIT-IN 80.3 8.1K
Other Tag Splits
F1 Size
• UNARY-DT: mark demonstratives as DT^U (“the
X” vs. “those”) 80.4 8.1K
• UNARY-RB: mark phrasal adverbs as RB^U
80.5 8.1K
(“quickly” vs. “very”)
• TAG-PA: mark tags with non-canonical parents
81.2 8.5K
(“not” is an RB^VP)
• SPLIT-AUX: mark auxiliary verbs with –AUX [cf. 81.6 9.0K
Charniak 97]
• SPLIT-CC: separate “but” and “&” from other 81.7 9.1K
conjunctions
• SPLIT-%: “%” gets its own tag. 81.8 9.3K
Yield Splits

• Problem: sometimes the behavior

of a category depends on
something inside its future yield.

• Examples:
• Possessive NPs
• Finite vs. infinite VPs
• Lexical heads!

• Solution: annotate future Annotation F1 Size

elements into nodes.
tag splits 82.3 9.7K
POSS-NP 83.1 9.8K
SPLIT-VP 85.7 10.5K
Distance / Recursion Splits

• Problem: vanilla PCFGs cannot NP -v

distinguish attachment
heights. VP
NP
• Solution: mark a property of
higher or lower sites: PP
• Contains a verb.
v
• Is (non)-recursive. Annotation F1 Size
• Base NPs [cf. Collins 99]
Previous 85.7 10.5K
• Right-recursive NPs
BASE-NP 86.0 11.7K
DOMINATES-V 86.9 14.1K
RIGHT-REC-NP 87.0 15.2K
A Fully Annotated Tree
Final Test Set Results

Parser LP LR F1
Magerman 95 84.9 84.6 84.7
Collins 96 86.3 85.8 86.0
Klein & Manning 03 86.9 85.7 86.3
Charniak 97 87.4 87.5 87.4
Collins 99 88.7 88.6 88.6

• Beats “first generation” lexicalized parsers

Lexicalised PCFGs

111
Heads in Context Free Rules

112
Heads

113
Rules to Recover Heads: An Example for NPs

114
Rules to Recover Heads: An Example for VPs

115
Adding Headwords to Trees

116
Adding Headwords to Trees

117
Adding Headwords to Trees

118
Lexicalized CFGs in Chomsky Normal Form

119
Example

120
Lexicalized CKY
(VP->VBD...NP)[saw]
X[h]
(VP-> VBD[saw] NP[her])

Y[h] Z[h’]

bestScore(X,i,j,h)
if (j = i)
return score(X,s[i]) i h k h’ j
else
return
max score(X[h]->Y[h]Z[w]) *
k,h,w
bestScore(Y,i,k,h) *
X->YZ
bestScore(Z,k+1,j,w)
max score(X[h]->Y[w]Z[h]) *
k,h,w
bestScore(Y,i,k,w) *
X->YZ
bestScore(Z,k+1,j,h)
Parsing with Lexicalized CFGs

122
Pruning with Beams
X[h]
• The Collins parser prunes with
per-cell beams [Collins 99] Y[h] Z[h’]
• Essentially, run the O(n5) CKY
• Remember only a few hypotheses for
each span <i,j>.
• If we keep K hypotheses at each i h k h’ j
span, then we do at most O(nK2)
work per span (why?)
• Keeps things more or less cubic

• Also: certain spans are forbidden

entirely on the basis of
punctuation (crucial for speed)
Parameter Estimation

124
A Model from Charniak (1997)

125
A Model from Charniak (1997)

126
Final Test Set Results

Parser LP LR F1
Magerman 95 84.9 84.6 84.7
Collins 96 86.3 85.8 86.0
Klein & Manning 03 86.9 85.7 86.3
Charniak 97 87.4 87.5 87.4
Collins 99 88.7 88.6 88.6
Strengths and Weaknesses of PCFG Parsers

131

You might also like

NLP Unit-2
No ratings yet
NLP Unit-2
42 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
23 pages
Unit 2 Syntactic Processing
No ratings yet
Unit 2 Syntactic Processing
17 pages
Unit Iii
No ratings yet
Unit Iii
17 pages
Unit Iii - NLP
No ratings yet
Unit Iii - NLP
36 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
71 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
BS1646 3 PDF
No ratings yet
BS1646 3 PDF
26 pages
Constituency Parsing
No ratings yet
Constituency Parsing
94 pages
Natural Language Processing: Dr. Ahmed El-Bialy
100% (1)
Natural Language Processing: Dr. Ahmed El-Bialy
49 pages
Bomba de Lodo Gardner Denver Pz10
No ratings yet
Bomba de Lodo Gardner Denver Pz10
2 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
190-ECDIS JRC JAN-7201-9201 Instruct Manual Function 1-4-2019
100% (7)
190-ECDIS JRC JAN-7201-9201 Instruct Manual Function 1-4-2019
558 pages
NLP Lab File
100% (2)
NLP Lab File
66 pages
How To Avoid Marrying A Jerk
100% (3)
How To Avoid Marrying A Jerk
35 pages
21cse356t NLP Unit 2
No ratings yet
21cse356t NLP Unit 2
89 pages
Detailed Advertisement For Faculty Recruitment 19.07
No ratings yet
Detailed Advertisement For Faculty Recruitment 19.07
8 pages
4.chapter5 - Syntactic and Semantic Representations
No ratings yet
4.chapter5 - Syntactic and Semantic Representations
47 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
Synthesizing
100% (1)
Synthesizing
29 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
U2T2 (Angular Measurement)
No ratings yet
U2T2 (Angular Measurement)
24 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
Ch4-Phrase-Structure Grammars and Dependency Grammars PDF
No ratings yet
Ch4-Phrase-Structure Grammars and Dependency Grammars PDF
48 pages
Chapter15 NaturalLanguage
100% (1)
Chapter15 NaturalLanguage
35 pages
Patran Excercise3
No ratings yet
Patran Excercise3
18 pages
Natural Language Processing
No ratings yet
Natural Language Processing
57 pages
Unit 3
No ratings yet
Unit 3
25 pages
Lecture NLP
100% (1)
Lecture NLP
38 pages
Syntax JB Slides
No ratings yet
Syntax JB Slides
90 pages
Background
No ratings yet
Background
18 pages
SLoSP 2007 1
No ratings yet
SLoSP 2007 1
42 pages
Speak Hebrew For Real Intermediate I
From Everand
Speak Hebrew For Real Intermediate I
Ruti Yudovich
5/5 (4)
Lecture 6
No ratings yet
Lecture 6
43 pages
Test Plan Template 02
No ratings yet
Test Plan Template 02
10 pages
8 Parsing
No ratings yet
8 Parsing
40 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
50 pages
5th Unit NLP
No ratings yet
5th Unit NLP
32 pages
Reiki
No ratings yet
Reiki
4 pages
Syntax Parsing
No ratings yet
Syntax Parsing
95 pages
Our Lady of The Pillar College - San Manuel, Inc District No. 3, San Manuel, Isabela
No ratings yet
Our Lady of The Pillar College - San Manuel, Inc District No. 3, San Manuel, Isabela
16 pages
Phrase
No ratings yet
Phrase
6 pages
Cs224n 2025 Lecture04 Dep Parsing
No ratings yet
Cs224n 2025 Lecture04 Dep Parsing
53 pages
Module No. 3: Parsing Structure in Text
No ratings yet
Module No. 3: Parsing Structure in Text
54 pages
Syntax Complete
No ratings yet
Syntax Complete
22 pages
Mod - 3
No ratings yet
Mod - 3
51 pages
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
No ratings yet
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
89 pages
Unit 2 New One
No ratings yet
Unit 2 New One
12 pages
13-Dependency Grammar-03-09-2024
No ratings yet
13-Dependency Grammar-03-09-2024
31 pages
Rheological Properties of Honey: P. Trávníček, T. Vítěz, A. Přidal
No ratings yet
Rheological Properties of Honey: P. Trávníček, T. Vítěz, A. Přidal
6 pages
Module 3 NLP
No ratings yet
Module 3 NLP
32 pages
Natural Language Processing
No ratings yet
Natural Language Processing
11 pages
Lecture 4
No ratings yet
Lecture 4
26 pages
NLP Unit-2
No ratings yet
NLP Unit-2
31 pages
14 Syntax 1
No ratings yet
14 Syntax 1
22 pages
Syntactic Analysis - I
No ratings yet
Syntactic Analysis - I
28 pages
AI Module5 NLP
No ratings yet
AI Module5 NLP
23 pages
What Is Parsing
No ratings yet
What Is Parsing
47 pages
NLP Unit 2
No ratings yet
NLP Unit 2
48 pages
Knife Steels: The Steel Chart
No ratings yet
Knife Steels: The Steel Chart
1 page
NLP - Shortnotes Unit 3
No ratings yet
NLP - Shortnotes Unit 3
16 pages
Ai Unit 5
No ratings yet
Ai Unit 5
19 pages
NLP - Unit Ii
No ratings yet
NLP - Unit Ii
13 pages
Lecture-8. Only For This Batch
No ratings yet
Lecture-8. Only For This Batch
46 pages
Context Free Grammars
No ratings yet
Context Free Grammars
38 pages
3nlp Computer
No ratings yet
3nlp Computer
83 pages
214 Grammar 2014
No ratings yet
214 Grammar 2014
50 pages
CH 6
No ratings yet
CH 6
30 pages
Week 3 - Probablistic Context Free Grammars
No ratings yet
Week 3 - Probablistic Context Free Grammars
18 pages
Artificial Intelligence: Natural Language Processing II
No ratings yet
Artificial Intelligence: Natural Language Processing II
51 pages
Atural Anguage Rocessing: Chandra Prakash LPU
No ratings yet
Atural Anguage Rocessing: Chandra Prakash LPU
59 pages
English Grammar For Enginners
No ratings yet
English Grammar For Enginners
177 pages
Critical Thinking Training Instructional Matrix
No ratings yet
Critical Thinking Training Instructional Matrix
3 pages
Ubiquity Business Solutions
No ratings yet
Ubiquity Business Solutions
9 pages
Summerhal Fringe Brochure 2019
No ratings yet
Summerhal Fringe Brochure 2019
84 pages
RNPL Ad
No ratings yet
RNPL Ad
2 pages
Theories (Theory) of Outdoor & Adventure Education
No ratings yet
Theories (Theory) of Outdoor & Adventure Education
8 pages
A-heavy-metal-tolerant-novel-bacterium,-Bacillus-malikii-sp.-nov.,-isolated-from-tannery-effluent-wastewater_2015_Antonie-van-Leeuwenhoek,-International-Journal-of-General-and-Molecular-Microbiology.pdf
No ratings yet
A-heavy-metal-tolerant-novel-bacterium,-Bacillus-malikii-sp.-nov.,-isolated-from-tannery-effluent-wastewater_2015_Antonie-van-Leeuwenhoek,-International-Journal-of-General-and-Molecular-Microbiology.pdf
12 pages
Fidel LC Ricafranca Resumé
No ratings yet
Fidel LC Ricafranca Resumé
4 pages
Beyond Duality: The Choreography' of Gender in Dacia Maraini's Novels
No ratings yet
Beyond Duality: The Choreography' of Gender in Dacia Maraini's Novels
16 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
09 CNN
No ratings yet
09 CNN
67 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
The Three Billy Goats and Gruff
From Everand
The Three Billy Goats and Gruff
Robin Koontz
No ratings yet
12 Neuralcrf
No ratings yet
12 Neuralcrf
41 pages
Introduction To Probability Distributions - Random Variables
No ratings yet
Introduction To Probability Distributions - Random Variables
2 pages
GRCon17 Program 1
No ratings yet
GRCon17 Program 1
1 page
Mathematics P2 May 2024
No ratings yet
Mathematics P2 May 2024
25 pages
You Learn From Your Mistakes
No ratings yet
You Learn From Your Mistakes
1 page
Ravi Ranjan - RESULTS
No ratings yet
Ravi Ranjan - RESULTS
2 pages
Star Technique Template Apolitical
No ratings yet
Star Technique Template Apolitical
5 pages
Cbse X Phase II Dpt-15 Social Science (Set A) QP
No ratings yet
Cbse X Phase II Dpt-15 Social Science (Set A) QP
4 pages
English 5yo
No ratings yet
English 5yo
5 pages