Dependency Parsing
Dependency Parsing
Richard Socher
Lecture 7: Dependency Parsing
Organization
Reminders/comments:
• Final project discussion – come meet with us
• Extra credit for most prolific piazza student answerers
• Midterm in two weeks
• Practice exams are on the website
2 1/30/18
Lecture Plan
1. Syntactic Structure: Constituency and Dependency
2. Dependency Grammar
3. Transition-based dependency parsing
4. Neural dependency parsing
3 1/30/18
Two views of linguistic structure:
Constituency = phrase structure grammar
= context-free grammars (CFGs)
Phrase structure organizes words into nested constituents.
6 1/30/18
Two views of linguistic structure:
Dependency structure
• Dependency structure shows which words depend on (modify or
are arguments of) which other words.
12 1/30/18
PP attachment ambiguities in dependency
structure
13
Scientists study whales from space
1/30/18
Attachment ambiguities
• A key parsing decision is how we ‘attach’ various constituents
• PPs, adverbial or participial phrases, infinitives, coordinations,
etc.
14 1/30/18
Attachment ambiguities
• A key parsing decision is how we ‘attach’ various constituents
• PPs, adverbial or participial phrases, infinitives, coordinations,
etc.
16 1/30/18
The rise of annotated data
Starting off, building a treebank seems a lot slower and less useful
than building a grammar
17 1/30/18
Dependency Grammar and
Dependency Structure
ports
by Senator Republican
on and immigration
Kansas
18 1/30/18 of
Dependency Grammar and
Dependency Structure
Selected dependency relations from the Universal Dependency set. (de Marneffe et al., 2014)
21 https://web.stanford.edu/~jurafsky/slp3/14.pdf 1/30/18
Pāṇini’s grammar
(c. 5th century BCE)
Gallery: http://wellcomeimages.org/indexplus/image/L0032691.html
CC BY 4.0 File:Birch bark MS from Kashmir of the Rupavatra Welcome L0032691.jpg
22 1/30/18
Dependency Grammar/Parsing History
• Some people draw the arrows one way; some the other way!
• Tesnière had them point from head to dependent…
• Ours will point from head to dependent
• Usually add a fake ROOT so every word is a dependent of
precisely 1 other node
24 1/30/18
Dependency Conditioning Preferences
1. Dynamic programming
2. Graph algorithms
You create a Minimum Spanning Tree for a sentence
McDonald et al.’s (2005) MSTParser scores dependencies independently
using an ML classifier (he uses MIRA, for online learning, but it can be
something else)
3. Constraint Satisfaction
Edges are eliminated that don’t satisfy hard constraints. Karlsson (1990), etc.
4. “Transition-based parsing” or “deterministic dependency
parsing”
Greedy choice of attachments guided by good machine learning classifiers
MaltParser (Nivre et al. 2008). Has proven highly effective.
27 1/30/18
4. Greedy transition-based parsing
[Nivre 2003]
29 1/30/18
Arc-standard transition-based parser
(there are other transition schemes …)
Analysis of “I ate fish”
Start Start: σ = [ROOT], β = w1, …, wn , A = ∅
1. Shift σ, wi|β, A è σ|wi, β, A
2. Left-Arcr σ|wi|wj, β, A è
[root] I ate fish σ|wj, β, A∪{r(wj,wi)}
3. Right-Arcr σ|wi|wj, β, A è
σ|wi, β, A∪{r(wi,wj)}
Finish: β = ∅
Shift
Shift
30 1/30/18
Arc-standard transition-based parser
Analysis of “I ate fish”
Left Arc
A +=
[root] I ate [root] ate nsubj(ate → I)
Shift
Right Arc
A +=
[root] ate [root] root([root] → ate)
31 Finish
1/30/18
MaltParser
[Nivre and Hall 2005]
32 1/30/18
Feature Representation
binary, sparse 0 0 0 1 0 0 1 0 …0 0 1 0
dim =106 ~ 107
Feature templates: usually a
combination of 1 ~ 3 elements from
the configuration.
Indicator features
33
Evaluation of Dependency Parsing:
(labeled) dependency accuracy
Acc = # correct deps
# of deps
UAS = 4 / 5 = 80%
ROOT She saw the video lecture
LAS = 2 / 5 = 40%
0 1 2 3 4 5
Gold Parsed
1 2 She nsubj 1 2 She nsubj
2 0 saw root 2 0 saw root
3 5 the det 3 4 the det
4 5 video nn 4 5 video nsubj
5 2 lecture obj 5 2 lecture ccomp
34 1/30/18
Dependency paths identify semantic
relations – e.g, for protein interaction
[Erkan et al. EMNLP 07, Fundel et al. 2007, etc.]
demonstrated
nsubj ccomp
35 1/30/18
Projectivity
37 1/30/18
Why train a neural dependency parser?
Indicator Features Revisited
Our Approach:
learn a dense and compact feature representation
38
A neural dependency parser
[Chen and Manning 2014]
40
Extracting Tokens and then vector
representations from configuration
• We extract a set of tokens based on the stack / buffer positions:
s1 good JJ ∅
s2 has VBZ ∅
b1 control NN ∅
lc(s1) ∅ + ∅ + ∅
rc(s1) ∅ ∅ ∅
lc(s2) He PRP nsubj
rc(s2) ∅ ∅ ∅
• We convert them to vector embeddings and concatenate them 41
Model Architecture
Softmax probabilities
Output layer y cross-entropy error will be
y = softmax(Uh + b2) back-propagated to the
embeddings.
Hidden layer h
h = ReLU(Wx + b1)
Input layer x
lookup + concat
42
Non-linearities between layers:
Why they’re needed
43 1/30/18
Non-linearities: sigmoid and tanh
45 1/30/18
Non-linearities: ReLU
46 1/30/18
Dependency parsing for sentence structure
0.5 0.8
0.3 2.0
0.5 0.8
0.3 2.0