0% found this document useful (0 votes)
78 views

CLEdependencyparsing

The document describes the Chu-Liu-Edmonds algorithm for graph-based dependency parsing. The Chu-Liu-Edmonds algorithm finds the highest scoring arborescence (directed rooted tree) in a graph. It does this through two stages - a contracting stage where connected components are contracted into single nodes to remove cycles, and an expanding stage where the contractions are undone to build the full parse tree. The algorithm greedily assigns incoming edges to nodes, contracting components if they form cycles, and finding the optimal incoming edge for the contracted component to determine which edge to remove from the cycle.

Uploaded by

Yao Grace
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

CLEdependencyparsing

The document describes the Chu-Liu-Edmonds algorithm for graph-based dependency parsing. The Chu-Liu-Edmonds algorithm finds the highest scoring arborescence (directed rooted tree) in a graph. It does this through two stages - a contracting stage where connected components are contracted into single nodes to remove cycles, and an expanding stage where the contractions are undone to build the full parse tree. The algorithm greedily assigns incoming edges to nodes, contracting components if they form cycles, and finding the optimal incoming edge for the contracted component to determine which edge to remove from the cycle.

Uploaded by

Yao Grace
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Graph-based Dependency Parsing

(Chu-Liu-Edmonds algorithm)

Sam Thomson (with thanks to Swabha Swayamdipta)

University of Washington, CSE 490u

February 22, 2017


Outline

I Dependency trees
I Three main approaches to parsing
I Chu-Liu-Edmonds algorithm
I Arc scoring / Learning
Dependency Parsing - Output
Dependency Parsing

TurboParser output from


http://demo.ark.cs.cmu.edu/parse?sentence=I%20ate%20the%20fish%20with%20a%20fork.
Dependency Parsing - Output Structure

A parse is an arborescence (aka directed rooted tree):


I Directed [Labeled] Graph
I Acyclic
I Single Root
I Connected and Spanning: ∃ directed path from root to every
other word
Projective / Non-projective

I Some parses are projective: edges don’t cross


I Most English sentences are projective, but non-projectivity is
common in other languages (e.g. Czech, Hindi)
Non-projective sentence in English:

and Czech:

Examples from Non-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05
Dependency Parsing - Approaches
Dependency Parsing Approaches

I Chart (Eisner, CKY)


I O(n3 )
I Only produces projective parses
Dependency Parsing Approaches

I Chart (Eisner, CKY)


I O(n3 )
I Only produces projective parses
I Shift-reduce
I O(n) (fast!), but inexact
I “Pseudo-projective” trick can capture some non-projectivity
Dependency Parsing Approaches

I Chart (Eisner, CKY)


I O(n3 )
I Only produces projective parses
I Shift-reduce
I O(n) (fast!), but inexact
I “Pseudo-projective” trick can capture some non-projectivity
I Graph-based (MST)
I O(n2 ) for arc-factored
I Can produce projective and non-projective parses
Graph-based Dependency Parsing
Arc-Factored Model

Every possible labeled directed edge e between every pair of nodes


gets a score, score(e).
Arc-Factored Model

Every possible labeled directed edge e between every pair of nodes


gets a score, score(e).

G = hV , E i =

(O(n2 ) edges)

Example from Non-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05
Arc-Factored Model

Best parse is:


X
A∗ = arg max score(e)
A⊆G e∈A
s.t. A an arborescence

etc. . .
The Chu-Liu-Edmonds algorithm finds this argmax.

Example from Non-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05
Arc-Factored Model

Best parse is:


X
A∗ = arg max score(e)
A⊆G e∈A
s.t. A an arborescence

etc. . .
The Chu-Liu-Edmonds algorithm finds this argmax.

Example from Non-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05
Arc-Factored Model

Best parse is:


X
A∗ = arg max score(e)
A⊆G e∈A
s.t. A an arborescence

etc. . .
The Chu-Liu-Edmonds algorithm finds this argmax.

Example from Non-projective Dependency Parsing using Spanning Tree Algorithms McDonald et al., EMNLP ’05
Chu-Liu-Edmonds

Chu and Liu ’65, On the Shortest Arborescence of a Directed Graph, Science
Sinica

Edmonds ’67, Optimum Branchings, JRNBS


Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge


Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge


In fact, every connected component that doesn’t contain ROOT
needs exactly 1 incoming edge
Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge


In fact, every connected component that doesn’t contain ROOT
needs exactly 1 incoming edge
I Greedily pick an incoming edge for each node.
Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge


In fact, every connected component that doesn’t contain ROOT
needs exactly 1 incoming edge
I Greedily pick an incoming edge for each node.
I If this forms an arborescence, great!
Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge


In fact, every connected component that doesn’t contain ROOT
needs exactly 1 incoming edge
I Greedily pick an incoming edge for each node.
I If this forms an arborescence, great!
I Otherwise, it will contain a cycle C .
Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge


In fact, every connected component that doesn’t contain ROOT
needs exactly 1 incoming edge
I Greedily pick an incoming edge for each node.
I If this forms an arborescence, great!
I Otherwise, it will contain a cycle C .
I Arborescences can’t have cycles, so we can’t keep every edge
in C . One edge in C must get kicked out.
Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge


In fact, every connected component that doesn’t contain ROOT
needs exactly 1 incoming edge
I Greedily pick an incoming edge for each node.
I If this forms an arborescence, great!
I Otherwise, it will contain a cycle C .
I Arborescences can’t have cycles, so we can’t keep every edge
in C . One edge in C must get kicked out.
I C also needs an incoming edge.
Chu-Liu-Edmonds - Intuition

Every non-ROOT node needs exactly 1 incoming edge


In fact, every connected component that doesn’t contain ROOT
needs exactly 1 incoming edge
I Greedily pick an incoming edge for each node.
I If this forms an arborescence, great!
I Otherwise, it will contain a cycle C .
I Arborescences can’t have cycles, so we can’t keep every edge
in C . One edge in C must get kicked out.
I C also needs an incoming edge.
I Choosing an incoming edge for C determines which edge to
kick out
Chu-Liu-Edmonds - Recursive (Inefficient) Definition

def maxArborescence(V , E , ROOT ):


””” returns best arborescence as a map from each node to its parent ”””
for v in V \ ROOT:
bestInEdge[v ] ← arg maxe∈inEdges[v ] e.score
if bestInEdge contains a cycle C :
# build a new graph where C is contracted into a single node
vC ← new Node()
V 0 ← V ∪ {vC } \ C
E 0 ← {adjust(e) for e ∈ E \ C }
A ← maxArborescence(V 0 , E 0 , ROOT )
return {e.original for e ∈ A} ∪ C \ {A[vC ].kicksOut}
# each node got a parent without creating any cycles
return bestInEdge

def adjust(e):
e 0 ← copy(e)
e 0 .original ← e
if e.dest ∈ C :
e 0 .dest ← vC
e 0 .kicksOut ← bestInEdge[e.dest]
e 0 .score ← e.score − e 0 .kicksOut.score
elif e.src ∈ C :
e 0 .src ← vC
return e 0
Chu-Liu-Edmonds

Consists of two stages:


I Contracting (everything before the recursive call)
I Expanding (everything after the recursive call)
Chu-Liu-Edmonds - Preprocessing

I Remove every edge incoming to ROOT


I This ensures that ROOT is in fact the root of any solution
I For every ordered pair of nodes, vi , vj , remove all but the
highest-scoring edge from vi to vj
Chu-Liu-Edmonds - Contracting Stage

I For each non-ROOT node v , set bestInEdge[v ] to be its


highest scoring incoming edge.
I If a cycle C is formed:
I contract the nodes in C into a new node vC
I edges outgoing from any node in C now get source vC
I edges incoming to any node in C now get destination vC
I For each node u in C , and for each edge e incoming to u from
outside of C :
I set e.kicksOut to bestInEdge[u], and
I set e.score to be e.score − e.kicksOut.score.
I Repeat until every non-ROOT node has an incoming edge and
no cycles are formed
An Example - Contracting Stage

bestInEdge
V1
ROOT
V2
V3

a:5 b:1 c :1

kicksOut
a
V1 d : 11 V2 f :5 V3 b
g : 10 i :8
c
d
e :4 e
f
h:9
g
h
i
An Example - Contracting Stage

bestInEdge
V1 g
ROOT
V2
V3

a:5 b:1 c :1

kicksOut
a
V1 d : 11 V2 f :5 V3 b
g : 10 i :8
c
d
e :4 e
f
h:9
g
h
i
An Example - Contracting Stage

bestInEdge
V1 g
ROOT
V2 d
V3

a:5 b:1 c :1

kicksOut
a
V1 d : 11 V2 f :5 V3 b
g : 10 i :8
c
d
e :4 e
f
h:9
g
h
i
An Example - Contracting Stage

bestInEdge
V1 g
ROOT
V2 d
V3

a : 5 − 10 b : 1 − 11 c :1

kicksOut
V4 a g
V1 d : 11 V2 f :5 V3 b d
g : 10 i : 8 − 11
c
d
e :4 e
f
h : 9 − 10 g
h g
i d
An Example - Contracting Stage

bestInEdge
V1 g
ROOT
V2 d
a : −5
V3
V4
b : −10 c :1
kicksOut
a g
V4 f :5 V3 b d
c
d
i : −3
e
e :4 f
h : −1
g
h g
i d
An Example - Contracting Stage

bestInEdge
V1 g
ROOT
V2 d
a : −5
V3 f
V4
b : −10 c :1
kicksOut
a g
V4 f :5 V3 b d
c
d
i : −3
e
e :4 f
h : −1
g
h g
i d
An Example - Contracting Stage

bestInEdge
V1 g
ROOT
V2 d
a : −5
V3 f
V4 h
b : −10 c :1
kicksOut
a g
V4 f :5 V3 b d
c
d
i : −3
e
e :4 f
h : −1
g
h g
i d
An Example - Contracting Stage

bestInEdge
V1 g
ROOT V2 d
V3 f
a : −5 − −1 V4 h
V5
b : −10 − −1 c :1−5
kicksOut
a g, h
V5 b d, h
V4 f :5 V3 c f
d
i : −3 e
f
e :4 g
h : −1 h g
i d
An Example - Contracting Stage

bestInEdge
V1 g
V2 d
V3 f
V4 h
ROOT V5
kicksOut
a g, h
b d, h
b : −9
c f
a : −4 c : −4 d
e f
V5 f
g
h g
i d
An Example - Contracting Stage

bestInEdge
V1 g
V2 d
V3 f
V4 h
ROOT V5 a
kicksOut
a g, h
b d, h
b : −9
c f
a : −4 c : −4 d
e f
V5 f
g
h g
i d
Chu-Liu-Edmonds - Expanding Stage

After the contracting stage, every contracted node will have


exactly one bestInEdge. This edge will kick out one edge inside
the contracted node, breaking the cycle.
I Go through each bestInEdge e in the reverse order that we
added them
I lock down e, and remove every edge in kicksOut(e) from
bestInEdge.
An Example - Expanding Stage

bestInEdge
V1 g
V2 d
V3 f
V4 h
ROOT V5 a
kicksOut
a g, h
b : −9 b d, h
a : −4 c : −4
c f
d
e f
V5
f
g
h g
i d
An Example - Expanding Stage

bestInEdge
V1 ag
V2 d
V3 f
V4 ah
ROOT V5 a
kicksOut
a g, h
b : −9 b d, h
a : −4 c : −4 c f
d
e f
V5
f
g
h g
i d
An Example - Expanding Stage

bestInEdge
V1 ag
V2 d
ROOT
V3 f
a : −5 V4 ah
V5 a
b : −10 c :1
kicksOut
a g, h
b d, h
V4 f :5 V3 c f
d
i : −3
e f
f
e :4 g
h : −1 h g
i d
An Example - Expanding Stage

bestInEdge
V1 ag
V2 d
ROOT
V3 f
a : −5 V4 ah
V5 a
b : −10 c :1
kicksOut
a g, h
b d, h
V4 f :5 V3 c f
d
i : −3
e f
f
e :4 g
h : −1 h g
i d
An Example - Expanding Stage

bestInEdge
V1 ag
V2 d
ROOT
V3 f
V4 ah
V5 a
a:5 b:1 c :1
kicksOut
a g, h
b d, h
V1 d : 11 V2 f :5 V3 c f
g : 10 i :8
d
e f
e :4 f
g
h:9 h g
i d
An Example - Expanding Stage

bestInEdge
V1 ag
V2 d
ROOT
V3 f
V4 ah
V5 a
a:5 b:1 c :1
kicksOut
a g, h
b d, h
V1 d : 11 V2 f :5 V3 c f
g : 10 i :8
d
e f
e :4 f
g
h:9 h g
i d
Chu-Liu-Edmonds - Notes

I This is a greedy algorithm with a clever form of delayed


back-tracking to recover from inconsistent decisions (cycles).
I CLE is exact: it always recovers the optimal arborescence.
Chu-Liu-Edmonds - Notes

I Efficient implementation:
Tarjan ’77, Finding Optimum Branchings, Networks
Not recursive. Uses a union-find (a.k.a. disjoint-set) data
structure to keep track of collapsed nodes.
Chu-Liu-Edmonds - Notes

I Efficient (wrong) implementation:


Tarjan ’77, Finding Optimum Branchings*, Networks
*corrected in Camerini et al. ’79, A note on finding optimum branchings,
Networks
Not recursive. Uses a union-find (a.k.a. disjoint-set) data
structure to keep track of collapsed nodes.
Chu-Liu-Edmonds - Notes

I Efficient (wrong) implementation:


Tarjan ’77, Finding Optimum Branchings*, Networks
*corrected in Camerini et al. ’79, A note on finding optimum branchings,
Networks
Not recursive. Uses a union-find (a.k.a. disjoint-set) data
structure to keep track of collapsed nodes.
I Even more efficient:
Gabow et al. ’86, Efficient Algorithms for Finding Minimum Spanning
Trees in Undirected and Directed Graphs, Combinatorica
Uses a Fibonacci heap to keep incoming edges sorted.
Finds cycles by following bestInEdge instead of randomly
visiting nodes.
Describes how to constrain ROOT to have only one outgoing
edge
Arc Scoring / Learning
Arc Scoring

Features
can look at source (head), destination (child), and arc label.
For example:
I number of words between head and child,
I sequence of POS tags between head and child,
I is head to the left or right of child?
I vector state of a recurrent neural net at head and child,
I vector embedding of label,
I etc.
Learning

Recall that when we have a parameterized model, and we have a


decoder that can make predictions given that model. . .
Learning

Recall that when we have a parameterized model, and we have a


decoder that can make predictions given that model. . .
we can use structured perceptron, or structured hinge loss:

Lθ (xi , yi ) = max {scoreθ (y ) + cost(y , yi )} − scoreθ (yi )


y ∈Y

You might also like