18-Graph Based Dependency Parsing-19-09-2024
18-Graph Based Dependency Parsing-19-09-2024
1
Dependency parsing is different from constituent parsing
11
There are many ways to parse dependencies
Goal: find the highest scoring dependency tree in the space of all
possible trees for a sentence.
Intuition: since each word has exactly one parent, this is like a
tagging problem, where the possible tags are the other words in the
sentence (or a dummy node called root). If we edge factorize the
score of a tree so that it is simply the product of its edge scores,
then we can simply select the best incoming edge for each word...
subject to the constraint that the result must be a tree.
13
Formalizing graph-based dependency parsing
14
The best dependency parse is the maximum spanning tree
15
Chu-Liu-Edmonds (CLE) Algorithm
Example: x = John saw Mary, with graph Gx. Start with the fully
connected graph, with scores:
9
root 10
20 30
9 sa w
John 30 0 Mary
11
16
Chu-Liu-Edmonds (CLE) Algorithm
Each node j in the graph greedily selects the incoming edge with
the highest score s(i, j):
root
20 30
saw
John 30 Mary
Identify the cycle and contract it into a single node and recalculate
scores of incoming and outgoing edges.
Intuition: edges into the cycle are the weight of the cycle with only
the dependency of the target word changed.
9
root 40
30
saw
wjs
John Mary
31
root 40
30
saw
wjs
John Mary
19
CLE Algorithm: Reconstruction
root
10
saw
30 30
John Mary
20
Where do we get edge scores s(i, j) from?
Σ
s(x, y) = s(i, j)
(i,j)∈y
21
Where do we get edge scores s(i, j) from?
Σ
s(x, y) = s(i, j)
(i,j)∈y
For the decade after 2005: linear model trained with clever variants
of SVMs, MIRA, etc.
21
Where do we get edge scores s(i, j) from?
Σ
s(x, y) = s(i, j)
(i,j)∈y
For the decade after 2005: linear model trained with clever variants
of SVMs, MIRA, etc.
21
Scoring edges with a neural network
exp(g(aj , a i ))
s(i, j) = Phead (wj |wi , x) = Σ |x|
k=0 exp(g(ak , a i ))
We get ai by concatenating the hidden states of a forward and
backward RNN at position i.
23
Transition-based Dependency Parsing
punct
root dobj
nsubj amod
25
Summary
• the MST parser selects the globally optimal tree, given a set
of edges with scores;
• it can naturally handle projective and non-projective trees;
• a transition-based parser makes a sequence of local decisions
about the best parse action;
• it can be extended to projective dependency trees by changing
the transition set;
• accuracies are similar, but transition-based is faster;
• both require dynamic classifiers, and these can be
implemented using neural networks, conditioned on
bidirectional RNN encodings of the sentence.
26