0% found this document useful (0 votes)
5 views

Lecture08 Dependency Parsing

This document discusses dependency parsing in natural language processing, detailing its historical background, structure, and various approaches. It explains the concepts of dependency relations, projectivity, and different parsing strategies, including transition-based parsing. The lecture emphasizes the importance of grammatical relationships and the search problem nature of dependency parsing.

Uploaded by

yl5404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture08 Dependency Parsing

This document discusses dependency parsing in natural language processing, detailing its historical background, structure, and various approaches. It explains the concepts of dependency relations, projectivity, and different parsing strategies, including transition-based parsing. The lecture emphasizes the importance of grammatical relationships and the search problem nature of dependency parsing.

Uploaded by

yl5404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Natural Language

Processing
Lecture 8: Dependency Parsing

9/27/24

COMS W4705
Daniel Bauer
Dependency Structure dobj

det
det nsubj

det advmod amod

the girl likes a few very friendly boys

• The edges are be labeled with grammatical relations between


words (typed dependencies):
• Arguments (Subject, Object, Indirect Object, Prepositional
Object)
• Adjunct (Temporal, Locative, Causal, Manner…) / Modi er
• Function words
fi
Dependency Structure
• Long history in linguistics (Starting with Panini’s Grammar of
Sanskrit, 4th century BCE).

• Modern dependency grammar originates with Lucien


Tesnière (1893-1954) and Igor Mel'čuk (*1932).

• Di erent from phrase structure (but related via the concept of


constituency and heads)

• Focus is on grammatical relationships between words


(Subject, Object, …)

• Tighter connection to natural language semantics.


ff
Dependency Relations
• Each dependency relation consists of a head and a
dependent.
subj

girl likes
dependent head

• Represent individual edges as


subj(likes-02, girl-01)

• or as a triple (likes, nsubj, girl)

• And the entire sentence structure as a set of edges:


root(likes-2), subj(likes-2, girl-1), det(the-0, girl-1), obj(likes-2, boys-7),
det(boys-7, few-4), det(few-4, a-3), amod(boys-7, friendly-6), advmod(friendly-6, very-5)
Another Example
PU
PRED

PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little e ect on nancial markets .


0 1 2 3 4 5 6 7 8 9

Dependency structure G = (Vs, A)

set of nodes Vs= {root, Economic, news, had, little, e ect, on, nancial, markets, . }

set of edges/ A = {(root, PRED, had), (had, SBJ, news), (had, OBJ, e ect),(had, PU, .),
arcs (news,ATT,Economic),(e ect,ATT,little),(e ect,ATT,on), (on,PC,markets),
(markets, ATT, nancial)}
fi
ff
ff
ff
ff
fi
ff
fi
Another Example
had

news e ect .

Economic little on
markets
nancial

Economic news had little e ect on nancial markets .


1 2 3 4 5 6 7 8 9
G = (V, A)

V = {root, Economic, news, had, little, e ect, on, nancial, markets, . }

A = {(root, PRED, had), (had, SBJ, news), (had, OBJ, e ect),(had, PU, .),
(news,ATT,Economic),(e ect,ATT,little),(e ect,ATT,on), (on,PC,markets),
(markets, ATT, nancial)}
fi
ff
fi
ff
ff
ff
ff
fi
ff
fi
Different Dependency
Representations
• How to deal with prepositions?
nmod
pobj
case
nmod amod amod

e ect on nancial markets e ect on nancial markets

• How to deal with conjunctions?


conj

dobj cc conj dobj cc


dobj
likes cats and dogs likes cats and dogs
conj conj

likes cats and dogs


ff
ff
fi
fi
Inventory of Relations
"Universal Dependencies" (Marne e et al. 2014)

Source: http://universaldependencies.org/u/dep/
ff
Dependency Trees
• Dependency structure is typically assumed to be a tree.

• Root node 0 must not have a parent.

• All other nodes must have exactly one parent.

• The graph needs to be connected.

• Nodes must not form a cycle.


Projectivity
• Words in a sentence stand in a linear order.

• If dependency edges cross, the dependency structure is non-


projective.

A hearing is scheduled on the issue today

• Non-projective structures appear more frequently in some languages than


others (Hungarian, German, ...)

• Some approaches to dependency parsing cannot handle non-projectivity.


Projectivity
• Words in a sentence stand in a linear order.

• If dependency edges cross, the dependency structure is non-


is
projective.

hearing scheduled
A on
today
issue
the

A hearing is scheduled on the issue today

• Non-projective structures appear more frequently in some languages than


others (Hungarian, German, ...)

• Some approaches to dependency parsing cannot handle non-projectivity.


Projectivity
is

hearing
scheduled
A on
today
issue
the

A hearing on the issue is scheduled today

An edge (i, r, j) in a dependency tree is projective if there is a


directed path from i to k for all i < k < j (if i < j) or j < k < i (j < i).
Projectivity
is

hearing scheduled
A on
today
issue
the

A hearing is scheduled on the issue today

An edge (i, r, j) in a dependency tree is projective if there is a


directed path from i to k for all i < k < j (if i < j) or j < k < i (j < i).
A dependency structure is non-projective if it contains at least
one non-projective edge.
Dependency Parsing
• Input:

• a set of nodes Vs = {w0, w1, ...,wm} corresponding to the


input sentence s = w1, ..., wm (0 is the special root node)

• an inventory of labels R = {PRED, SBJ, OBJ, ATT, ... }

• Goal: Find a set of labeled, directed edges between the


nodes, such that the resulting graph forms a
dependency tree over Vs.
Dependency Parsing
• What information could we use?

• bi-lexical a nities

• nancial markets, meeting... scheduled

• dependency distance (prefer close words?)

• Intervening words

• had little e ect, little gave e ect

• subcategorization/valency of heads.
fi
ff
ffi
ff
Subcategorization/Valency
• Verbs may take a di erent number of arguments of di erent
syntactic types in di erent positions (applies to other types
of words as well).

• The baby slept. * The baby slept the house.

• He pretended to sleep. *He pretended the cat.

• Godzilla destroyed the city. *Godzilla destroyed.

• Jenny gave the book to Carl. *Jenny gave the book.

• ...
ff
ff
ff
Dependency Parsing
• As with other NLP problems, we can think of dependency
parsing as a kind of search problem:

• Step 1: De ne the space of possible analyses for a


sentence

• Step 2: Select the best analysis from this search space.

• Need to de ne the search space, search algorithm, and a


way to determine the "best" parse.
fi
fi
Dependency Parsing
Approaches
• Grammar-based

• Data-based

• Dynamic Programming (e.g. Eisner 1996, )

• Graph Algorithms (e.g. McDonald 2005, MST Parser


Dozat et al. 2017)

• Transition-based (e.g. Nivre 2003, MaltParser


Chen & Manning 2014)

• Transformer-based approaches (e.g. Kondratyuk & Straka, 2019)

• Constraint satisfaction (Karlsson 1990)


Transition-Based
Dependency Parsing
• De nes the search space using parser states
(con gurations) and operations on these states
(transitions).

• Start with an initial con guration and nd a sequence of


transitions to the terminal state.

• Uses a greedy approach to nd the best sequence of


transitions.

• Uses a discriminative model (classi er) to select the


next transition.
fi
fi
fi
fi
fi
fi
Transition-Based Parsing -
States
• A parser state (con guration) is a triple c = (σ,β,A)

• σ - is a stack of words wi ∈ Vs

• β - is a bu er of words wi ∈ Vs

• A - is a set of dependency arcs (wi, r, wj)


ATT
A:
Economic news

news
σ:
root β: [had, little, e ect, on, nancial, markets,.]

([root, news]σ, [had, little, e ect, on, nancial, markets,.]β, { (news,ATT,Economic) }A


ff
ff
fi
fi
ff
fi
Transition-Based Parsing -
initial and terminal state

t0 t1 t2 tn-1
c0: ([w0]σ, [w1, w2, ... ,wm]β, {}A) c1 c2 ... cn-1 ([σ, [ ]β, A)
cT:

initial state transitions terminal state


(for any σ and A)

• Start with initial state c0.

• Apply sequence of transitions, t0, ..., tn-1.

• Once a terminal state CT is reached, return nal parse


A from state CT.
fi
Transition-Based Parsing -
Transitions ("Arc-Standard")
• Shift:
Move next word from the bu er to the stack
(σ, wi | β, A) => (σ | wi, β, A)

• Left-Arc (for relation r):


Build an edge from the next word on the bu er to the top
word on the stack.
(σ | wi, wj | β, A) => (σ, wj | β, A ∪ {wj, r, wi})

• Right-Arc (for relation r)


Build an edge from the top word on the stack to the next word on the
bu er.

(σ | wi, wj | β, A) => (σ, wi | β, A ∪ {wi, r, wj})


ff
ff
ff
Transition-Based Parsing -
Transitions
• Shift
Move next word from the bu er to the stack
(σ, wi | β, A) => (σ | wi, β, A)

ATT
A:
Economic news

news
σ:
root β: [had[little,
,little, e ect, on, nancial, markets,.]
ff
ff
fi
fi
ff
Transition-Based Parsing -
Transitions
• Arc-leftr
Build an edge from the next word on the bu er to the top
word on the stack.
(σ | wi, wj | β, A) => (σ, wj | β, A ∪ {wj, r, wi})
Not allowed if i=0 (root may not have a parent)

ATT
A:
Economic news

Econonomic
σ:
root β: [news, little, e ect, on, nancial, markets,.]

note: wj remains in the bu er


ff
fi
ff
ff
Transition-Based Parsing -
Transitions
• Arc-rightr
Build an edge from the next word on the bu er to the top
word on the stack.
(σ | wi, wj | β, A) => (σ, wi | β, A ∪ {wi, r, wj})

OBJ
ATT SBJ ATT
A:
Economic news had little e ect

had
σ:
root β: ect, on, nancial, markets,.]
[e [had,
note: wi is moved from the top of the stack back to the bu er!
ff
ff
fi
fi
ff
ff
Transition-Based Parsing -
Some Observations
• Does the transition system contain dead ends? (states from which a
terminal state cannot be reached)? No!

• What is the role of the bu er?

• Contains words that can become dependents of a right-arc.


Keep unseen words.

• What is the role of the stack?

• Keep track of nodes that can become dependents of a left-arc.

• Once a word disappears from the bu er and the stack it cannot be


part of any further edge!
ff
ff
Another Example
PU
PRED

PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little e ect on nancial markets .


0 1 2 3 4 5 6 7 8 9

G = (Vs, A)

Vs= {root, Economic, news, had, little, e ect, on, nancial, markets, . }

A = {(root, PRED, had), (had, SBJ, news), (had, OBJ, e ect),(had, PU, .),
(news,ATT,Economic),(e ect,ATT,little),(e ect,ATT,on), (on,PC,markets),
(markets, ATT, nancial)}
fi
ff
ff
ff
ff
fi
ff
fi
Transition-Based Parsing -
Oracle Example
initial state
next transition: shift

A:

σ: root β: [Economic, news, had, little, e ect, on, nancial, markets,.]


ff
fi
Transition-Based Parsing -
Oracle Example
next-transition: Left-ArcATT

A:

Economic
σ: root β: [ news, had, little, e ect, on, nancial, markets,.]
ff
fi
Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT

Economic news

σ: root β: [ news, had, little, e ect, on, nancial, markets,.]


ff
fi
Transition-Based Parsing -
Oracle Example
next transition: Left-ArcSBJ

A:

ATT

Economic news

news
σ: root β: [ had, little, e ect, on, nancial, markets,.]
ff
fi
Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT SBJ

Economic news had

σ: root β: [ had, little, e ect, on, nancial, markets,.]


ff
fi
Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT SBJ

Economic news had

had
σ: root β: [ little, e ect, on, nancial, markets,.]
ff
fi
Transition-Based Parsing -
Oracle Example
next transition: Left-ArcSBJ

A:

ATT SBJ

Economic news had

little
had
σ: root β: [ e ect, on, nancial, markets,.]
ff
fi
Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT SBJ ATT

Economic news had little e ect

had
σ: root β: [ e ect, on, nancial, markets,.]
ff
ff
fi
Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT SBJ ATT

Economic news had little e ect

e ect
had
σ: root β: [ on, nancial, markets,.]
ff
ff
fi
Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT SBJ ATT

Economic news had little e ect

on
e ect
had
σ: root β: [ nancial, markets,.]
ff
ff
fi
Transition-Based Parsing -
Oracle Example
next transition: Left-ArcATT

A:

ATT SBJ ATT

Economic news had little e ect

nancial
on
e ect
had
σ: root β: [ markets,.]
fi
ff
ff
Transition-Based Parsing -
Oracle Example
next transition: Right-ArcPC

A:

ATT SBJ ATT ATT

Economic news had little e ect nancial markets

on
e ect
had
σ: root β: [ markets,.]
fi
ff
ff
Transition-Based Parsing -
Oracle Example
next transition: Right-ArcATT

A:
PC
ATT SBJ ATT ATT

Economic news had little e ect on nancial markets

e ect
had
σ: root β: [on, .]
fi
ff
ff
Transition-Based Parsing -
Oracle Example
next transition: Right-ArcOBJ

A:
PC
ATT SBJ ATT ATT ATT

Economic news had little e ect on nancial markets

had
σ: root β: [e ect,.]
fi
ff
ff
Transition-Based Parsing -
Oracle Example
next transition: shift

A:
PC
OBJ
ATT SBJ ATT ATT ATT

Economic news had little e ect on nancial markets

σ: root β: [had,.]
fi
ff
Transition-Based Parsing -
Oracle Example
next transition: Right-ArcPU

A:
PC
OBJ
ATT SBJ ATT ATT ATT

Economic news had little e ect on nancial markets .

had
σ: root β: [.]
fi
ff
Transition-Based Parsing -
Oracle Example
next transition: Right-ArcPRED PU

A:
PC
OBJ
ATT SBJ ATT ATT ATT

Economic news had little e ect on nancial markets .

σ: root β: [had]
fi
ff
Transition-Based Parsing -
Oracle Example
next transition: shift PU
PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little e ect on nancial markets .

σ: β: [root]
fi
ff
Transition-Based Parsing -
Oracle Example
terminal PU
PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little e ect on nancial markets .

σ: root β: []
fi
ff
Properties of the transition
system
• The time required to parse w1,..., wm with an oracle is O(m).
Why?

• Bottom-up approach: A node must collect all its children


before its parent. Why?

• Can only produce projective trees. Why?

• This algorithm is complete (all projective trees over


w1,..., wm can be produced by some sequence of transitions)

• Soundness: All terminal structures are projective forests (but


not necessarily trees)
Deciding the Next Transition
• Instead of the unrealistic oracle, predict the next transition (and relation
label) using a discriminative classi er.

• Could use perceptron, log linear model, SVM, Neural Network, ...

• This is a greedy approach (could use beam-search too).

• If the classi er takes O(1), the runtime for parsing is still O(m) for m
words.

• Questions:

• What features should the classi er use?


Local features from each state

• How to train the model?


fi
fi
fi
Extracting Features
• Need to de ne a feature function that maps states to
feature vectors.

• Each feature consists of:

1. an address in the state description:


(identi es a speci c word in the con guration, for
example "top of stack").

2. an attribute of the word in that address:


(for example POS, word form, lemma, word
embedding, ...)
fi
fi
fi
fi
Example Features

Source: S. Kübler, R. McDonald, J. Nivre (2009):"Dependency Parsing", Morgan & Claypool


Training the Model
• Training data: Manually annotated (dependency) treebank

• Prague Dependency Treebank


English/Czech parallel data, dependencies for full PTB WSJ.

• Universal Dependencies Treebank


Treebanks for more than 80 languages (varying in size)
(http://universaldependencies.org/)

• Problem: We have not actually seen the transition sequence, only the
dependency trees!

• Idea: Construct oracle transition sequences from the dependency tree.


Train the model on these transitions.
Constructing Oracle
Transitions
• Start with initial state ([w0]σ, [w1, w2, ... ,wm]β, {}A).
• Then predict the next transition using the annotated
dependency tree Ad
"Arc-Standard" Transitions
• Shift:
Move next word from the bu er to the stack
(σ, wi | β, A) => (σ | wi, β, A)

• Left-Arc (for relation r):


Build an edge from the next word on the bu er to the top
word on the stack.
(σ | wi, wj | β, A) => (σ, wj | β, A ∪ {wj, r, wi})

• Right-Arc (for relation r)


Build an edge from the top word on the stack to the next word on the
bu er.

(σ | wi, wj | β, A) => (σ, wi | β, A ∪ {wi, r, wj})


ff
ff
ff
"Arc-Eager" Transitions
• Shift:
Move next word from the bu er to the stack
(σ, wi | β, A) => (σ | wi, β, A)

• Left-Arc (for relation r):


Build an edge from the next word on the bu er to the top
word on the stack.
(σ | wi, wj | β, A) => (σ, wj | β, A ∪ {(wj, r, wi)})
Precondition: (wi, *, wj) is not yet in A.
• Right-Arc (for relation r)
Build an edge from the top word on the stack to the next word on the bu er.
(σ | wi, wj | β, A) => (σ | wi | wj, β , A ∪ {wi, r, wj})

• Reduce
Remove a completed node from the stack.
(σ | wi , β, A) => (σ, β , A)
Precondition: there is some (*, *, wi) in A.
ff
ff
ff
Arc-Eager Example
next transition: RightArcpred Can immediately attach had to root.

A:

ATT SBJ

Economic news had

σ: root β: [ had, little, e ect, on, nancial, markets,.]


ff
fi
Arc-Eager Example
next transition: shift, then left-arc

PRED

A:

ATT SBJ

root Economic news had

had
σ: root β: [ little, e ect, on, nancial, markets,.]
ff
fi
Arc-Eager Example
next transition: RightArcObj

PRED

A:

ATT SBJ ATT

root Economic news had little e ect

had
σ: root β: [e ect, on, nancial, markets,.]
ff
ff
fi
Arc-Eager Example
next transition: RightArcATT

PRED

A:
OBJ
ATT SBJ ATT

root Economic news had little e ect

e ect
had
σ: root β: [on, nancial, markets,.]
ff
ff
fi
Arc-Eager Example
next transition: shift

PRED

A:
OBJ
ATT SBJ ATT ATT

root Economic news had little e ect on

on
e ect
little
had
σ: root β: [ nancial, markets,.]
fi
ff
ff
Arc-Eager Example
next transition: LeftArcATT

PRED

A:
OBJ
ATT SBJ ATT ATT

root Economic news had little e ect on

nancial
on
e ect
little
had
σ: root β: [markets,.]
fi
ff
ff
Arc-Eager Example
next transition: RightArcPC

PRED

A:
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little e ect on nancial markets

on
e ect
little
had
σ: root β: [markets,.]
fi
ff
ff
Arc-Eager Example
next transition: Reduce

PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little e ect on nancial markets

markets
on
e ect
little
had
σ: root β: [.]
fi
ff
ff
Arc-Eager Example
next transition: Reduce

PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little e ect on nancial markets

on
e ect
little
had
σ: root β: [.]
fi
ff
ff
Arc-Eager Example
next transition: Reduce

PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little e ect on nancial markets

e ect
little
had
σ: root β: [.]
fi
ff
ff
Arc-Eager Example
next transition: Reduce

PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little e ect on nancial markets

little
had
σ: root β: [.]
fi
ff
Arc-Eager Example
next transition: Reduce

PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little e ect on nancial markets

had
σ: root β: [.]
fi
ff
Graph-Based Approach
• Transition Based Parsing with arc-standard or arc-eager can only
produce projective dependency structures? Why?

• Graph-based approaches do not have this restriction.

• Basic idea:

• Each word is a vertex. Start with a completely connected graph.

• Use standard graph algorithms to compute a Maximum


Spanning Tree:

• Need a model that assigns a score to each edge ("edge-


factored model").

R. McDonald, K. Crammer, and F. Pereira (2005)


MST Example
root root
9 9
9 9
10 10

saw saw
30 30 30 30
20 0 20 0
11 11

John Mary John Mary


3 3

total score: 70
Computing the MST
• For undirected graphs, there are two common algorithms:

• Kruskal's and Prim's, both run in O(E log V)

• For dependency parsing we deal with directed graphs, so


these algorithms are not guaranteed to nd a tree.

• Instead use Chu–Liu-Edmonds' algorithm, which runs


in O(EV) (naive implementation) or O(E log V) (with
optimizations).
fi
Chu–Liu-
Edmonds' Algorithm (Sketch)
• Choose the highest incoming edge for each node to
obtain G'.

• If the result is a tree, return it (this is the MST).

• Otherwise G' contains at least one cycle.

• Collapse any cycle.

You might also like