0% found this document useful (0 votes)
3 views

Lecture 08

Uploaded by

1162407364
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture 08

Uploaded by

1162407364
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Natural Language

Processing
Lecture 8: Dependency Parsing

11/13/2020

COMS W4705
Yassine Benajiba
Dependency Structure
nsubj dobj

det det
det advmod amod

the girl likes a few very friendly boys

• The edges can be labeled with grammatical relations between


words (typed dependencies):

• Arguments (Subject, Object, Indirect Object, Prepositional


Object)

• Adjunct (Temporal, Locative, Causal, Manner…) / Modifier

• Function words
Dependency Structure
• Long history in linguistics (Starting with Panini’s Grammar of
Sanskrit, 4th century BCE).

• Modern dependency grammar originates with Tesniere and


Mel'čuk.

• Different from phrase structure (but related via the concept of


constituency and heads)

• Focus is on grammatical relationships between words


(Subject, Object, …)

• Tighter connection to natural language semantics.


Dependency Relations
• Each dependency relation consists of a head and a
dependent.
subj

girl likes
dependent head

• Represent individual edges as


subj(likes-02, girl-01)

• or as a triple (likes, nsubj, girl)

• And the entire sentence structure as a set of edges:


root(likes-2), subj(likes-2, girl-1), det(the-0, girl-1), obj(likes-2, boys-7),
det(boys-7, few-4), det(few-4, a-3), amod(boys-7, friendly-6), advmod(friendly-6, very-5)
Heads and Dependents
• How do we identify the the grammatical relation between head H and
Dependent D (in a particularly constituent C)?

• H determines the syntactic category of C and can often replace C.

• H determines the semantic category of C; D gives semantic specification.

• H is obligatory; D may be optional.

• H selects D and determines whether D is obligatory or optional.

• The form of D depends on H (agreement or government).

• The linear position of D is specified with reference to H.


Another Example
PU
PRED

PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little effect on financial markets .


0 1 2 3 4 5 6 7 8 9

Dependency structure G = (Vs, A)

set of nodes
Vs= {root, Economic, news, had, little, effect, on, financial, markets, . }

set of edges/
A = {(root, PRED, had), (had, SBJ, news), (had, OBJ, effect),(had, PU, .),
arcs
(news,ATT,Economic),(effect,ATT,little),(effect,ATT,on), (on,PC,markets),
(markets, ATT, financial)}
Another Example
had

news effect .

Economic little on
markets
financial

Economic news had little effect on financial markets .


1 2 3 4 5 6 7 8 9

G = (V, A)

V = {root, Economic, news, had, little, effect, on, financial, markets, . }

A = {(root, PRED, had), (had, SBJ, news), (had, OBJ, effect),(had, PU, .),
(news,ATT,Economic),(effect,ATT,little),(effect,ATT,on), (on,PC,markets),
(markets, ATT, financial)}
Different Dependency
Representations
• How to deal with prepositions?
nmod
pobj
case
nmod amod amod

effect on financial markets effect on financial markets

• How to deal with conjunctions?


conj

dobj cc conj dobj cc

likes cats and dogs likes cats and dogs


dobj
conj conj

likes cats and dogs


Inventory of Relations
"Universal Dependencies" (Marneffe et al. 2014)

Source: http://universaldependencies.org/u/dep/
Dependency Trees
• Dependency structure is typically assumed to be a tree.

• Root node 0 must not have a parent.

• All other nodes must have exactly one parent.

• The graph needs to be connected.

• Nodes must not form a cycle.


Projectivity
• Words in a sentence appear in a linear order.

• If dependency edges cross, the dependency structure is non-


projective.

root A hearing is scheduled on the issue today

• Non-projective structures appear more frequently in some languages than


others (Hungarian, German, ...)

• Some approaches to dependency parsing cannot handle non-projectivity.


Projectivity
• Words in a sentence stand in a linear order.

• If dependency edges cross, the dependency structure is non-


projective. is

hearing scheduled
A on
today
issue
the

A hearing is scheduled on the issue today


• Non-projective structures appear more frequently in some languages than
others (Hungarian, German, ...)

• Some approaches to dependency parsing cannot handle non-projectivity.


Projectivity

root A hearing is scheduled on the issue today

An edge (i, r, j) in a dependency tree is projective if there is a


directed path from i to k for all i < k < j (if i < j)
or all j < k < i (j < i).
Dependency Parsing
• Input:

• a set of nodes Vs = {w0, w1, ...,wm} corresponding to the


input sentence s = w1, ..., wm (0 is the special root node)

• an inventory of labels R = {PRED, SBJ, OBJ, ATT, ... }

• Goal: Find a set of labeled, directed edges between the


nodes, such that the resulting graph forms a
correct dependency tree over Vs.

structural constraints
Dependency Parsing
• What information could we use?

• bi-lexical affinities

• financial markets, meeting... scheduled

• dependency distance (prefer close words?)

• Intervening words

• had little effect, little gave effect

• subcategorization/valency of heads.
Subcategorization/Valency
• Verbs may take a different number of arguments of
different syntactic types in different positions:

• The baby slept. * The baby slept the house.

• He pretended to sleep. *He pretended the cat.

• Godzilla destroyed the city. *Godzilla destroyed.

• Jenny gave the book to Carl. *Jenny gave the book.

• ... examples for ask, promise, bet, load,...


Dependency Parsing
• As with other NLP problems, we can think of dependency
parsing as a kind of search problem:

• Step 1: Define the space of possible analyses for a


sentence

• Step 2: Select the best analysis from this search space.

• Need to define the search space, search algorithm, and a


way to determine the "best" parse.
Dependency Parsing
• Approaches to Dependency Parsing:

• Grammar-based

• Data-based

• Dynamic Programming (e.g. Eisner 1996, )

• Graph Algorithms (e.g. McDonald 2005, MST Parser)

• Transition-based (e.g. Nivre 2003, MaltParser)

• Constraint satisfaction (Karlsson 1990)


Transition-Based
Dependency Parsing
• Defines the search space using parser states
(configurations) and operations on these states
(transitions).

• Start with an initial configuration and find a sequence of


transitions to the terminal state.

• Uses a greedy approach to find the best sequence of


transitions.

• Uses a discriminative model (classifier) to select the


next transition.
Transition-Based Parsing -
States
• A parser state (configuration) is a triple c = (σ,β,A)
• σ - is a stack of words wi ∈ Vs
• β - is a buffer of words wi ∈ Vs
• A - is a set of dependency arcs (wi, r, wj)

ATT
A:
Economic news

news
σ:
root β: [had, little, effect, on, financial, markets,.]

([root, news]σ, [had, little, effect, on, financial, markets,.]β, { (news,ATT,Economic) }A


Transition-Based Parsing -
initial and terminal state

t0 t1 t2 tn-1
c0: ([w0]σ, [w1, w2, ... ,wm]β, {}A) c1 c2 ... cn-1 cT: ([σ,
[ ]β, A)
initial state transitions terminal state
(for any σ and A)

• Start with initial state c0.

• Apply sequence of transitions, t0, ..., tn-1.

• Once a terminal state CT is reached, return final parse


A from state CT.
Transition-Based Parsing -
Transitions ("Arc-Standard")
• Shift:
Move next word from the buffer to the stack
(σ, wi | β, A) => (σ | wi, β, A)

• Left-Arc (for relation r):


Build an edge from the next word on the buffer to the top
word on the stack.
(σ | wi, wj | β, A) => (σ, wj | β, A ∪ {wj, r, wi})

• Right-Arc (for relation r)


Build an edge from the top word on the stack to the next word on the
buffer.

(σ | wi, wj | β, A) => (σ, wi | β, A ∪ {wi, r, wj})


Transition-Based Parsing -
Transitions
• Shift
Move next word from the buffer to the stack
(σ, wi | β, A) => (σ | wi, β, A)

ATT
A:
Economic news

news
σ:
root β: [had[little,
,little, effect, on, financial, markets,.]
Transition-Based Parsing -
Transitions
• Arc-leftr
Build an edge from the next word on the buffer to the top
word on the stack.
(σ | wi, wj | β, A) => (σ, wj | β, A ∪ {wj, r, wi})
Not allowed if i=0 (root may not have a parent)

ATT
A:
Economic news

Econonomic
σ:
root β: [news, little, effect, on, financial, markets,.]

note: wj remains in the buffer


Transition-Based Parsing -
Transitions
• Arc-rightr
Build an edge from the next word on the buffer to the top
word on the stack.
(σ | wi, wj | β, A) => (σ, wi | β, A ∪ {wi, r, wj})

OBJ
ATT SBJ ATT
A:
Economic news had little effect

had
σ:
root β: [effect,
[had, on, financial, markets,.]
note: wi is moved from the top of the stack back to the buffer!
Transition-Based Parsing -
Some Observations
• Does the transition system contain dead ends? (states from which a
terminal state cannot be reached)? No!

• What is the role of the buffer?

• Contains words that can become dependents of a right-arc. Keep


unseen words.

• What is the role of the stack?

• Keep track of nodes that can become dependents of a left-arc.

• Once a word disappears from the buffer and the stack it cannot be
part of any further edge!
Another Example
PU
PRED

PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little effect on financial markets .


0 1 2 3 4 5 6 7 8 9

G = (Vs, A)

Vs= {root, Economic, news, had, little, effect, on, financial, markets, . }

A = {(root, PRED, had), (had, SBJ, news), (had, OBJ, effect),(had, PU, .),
(news,ATT,Economic),(effect,ATT,little),(effect,ATT,on), (on,PC,markets),
(markets, ATT, financial)}
Transition-Based Parsing -
Complete Example
initial state
next transition: shift (these are all predicted by discriminative ML classifier)

A:

σ: root β: [Economic, news, had, little, effect, on, financial, markets,.]


Transition-Based Parsing -
Oracle Example
next-transition: Left-ArcATT

A:

Economic
σ: root β: [ news, had, little, effect, on, financial, markets,.]
Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT

Economic news

σ: root β: [ news, had, little, effect, on, financial, markets,.]


Transition-Based Parsing -
Oracle Example
next transition: Left-ArcSBJ

A:

ATT

Economic news

news
σ: root β: [ had, little, effect, on, financial, markets,.]
Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT SBJ

Economic news had

σ: root β: [ had, little, effect, on, financial, markets,.]


Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT SBJ

Economic news had

had
σ: root β: [ little, effect, on, financial, markets,.]
Transition-Based Parsing -
Oracle Example
next transition: Left-ArcSBJ

A:

ATT SBJ

Economic news had

little
had
σ: root β: [ effect, on, financial, markets,.]
Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT SBJ ATT

Economic news had little effect

had
σ: root β: [ effect, on, financial, markets,.]
Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT SBJ ATT

Economic news had little effect

effect
had
σ: root β: [ on, financial, markets,.]
Transition-Based Parsing -
Oracle Example
next transition: shift

A:

ATT SBJ ATT

Economic news had little effect

on
effect
had
σ: root β: [ financial, markets,.]
Transition-Based Parsing -
Oracle Example
next transition: Left-ArcATT

A:

ATT SBJ ATT

Economic news had little effect

financial
on
effect
had
σ: root β: [ markets,.]
Transition-Based Parsing -
Oracle Example
next transition: Right-ArcPC

A:

ATT SBJ ATT ATT

Economic news had little effect financial markets

on
effect
had
σ: root β: [ markets,.]
Transition-Based Parsing -
Oracle Example
next transition: Right-ArcOBJ

A:
PC
ATT SBJ ATT ATT

Economic news had little effect on financial markets

effect
had
σ: root β: [on,.]
Transition-Based Parsing -
Oracle Example
next transition: Right-ArcOBJ

A:
PC
ATT SBJ ATT ATT ATT

Economic news had little effect on financial markets

had
σ: root β: [effect,.]
Transition-Based Parsing -
Oracle Example
next transition: shift

A:
PC
OBJ
ATT SBJ ATT ATT ATT

Economic news had little effect on financial markets

σ: root β: [had,.]
Transition-Based Parsing -
Oracle Example
next transition: Right-ArcPU

A:
PC
OBJ
ATT SBJ ATT ATT ATT

Economic news had little effect on financial markets .

had
σ: root β: [.]
Transition-Based Parsing -
Oracle Example
next transition: Right-ArcPRED PU

A:
PC
OBJ
ATT SBJ ATT ATT ATT

Economic news had little effect on financial markets .

σ: root β: [had]
Transition-Based Parsing -
Oracle Example
next transition: shift PU
PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little effect on financial markets .

σ: β: [root]
Transition-Based Parsing -
Oracle Example
terminal PU
PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little effect on financial markets .

σ: root β: []
Properties of the Transition
System
• The time required to parse w1,..., wm with an oracle is O(m).
Why?

• Bottom-up approach: A node must collect all its children


before its parent. Why?

• Can only produce projective trees. Why?

• This algorithm is complete (all projective trees over


w1,..., wm can be produced by some sequence of transitions)

• Soundness: All terminal structures are projective forests (but


not necessarily trees)
Deciding the Next Transition
• Instead of the unrealistic oracle, predict the next transition (and relation label)
using a discriminative classifier.

• Could use perceptron, log linear model, SVM, Neural Network, ...

• This is a greedy approach (could use beam-search too).

• If the classifier takes O(1), the runtime for parsing is still O(m) for m words.

• Questions:

• What features should the classifier use?


Local features from each state (buffer, stack, partial dependency structure)
... but ideally want to model entire history of transitions leading to the state.
• How to train the model?
Extracting Features
• Need to define a feature function that maps states to
feature vectors.

• Each feature consists of:

1. an address in the state description:


(identifies a specific word in the configuration, for
example "top of stack").

2. an attribute of the word in that address:


(for example POS, word form, lemma, word embedding,
...)
Example Features

Source: S. Kübler, R. McDonald, J. Nivre (2009):"Dependency Parsing", Morgan & Claypool


Training the Model
• Training data: Manually annotated (dependency) treebank

• Prague Dependency Treebank


English/Czech parallel data, dependencies for full PTB WSJ.

• Universal Dependencies Treebank


Treebanks for more than 80 languages (varying in size)
(http://universaldependencies.org/)

• Problem: We have not actually seen the transition sequence, only


the dependency trees!

• Idea: Construct oracle transition sequences from the dependency tree.


Train the model on these transitions.
Constructing Oracle
Transitions
• Start with initial state ([w0]σ, [w1, w2, ... ,wm]β, {}A).
• Then predict the next transition using the annotated
dependency tree Ad
"Arc-Standard" Transitions
• Shift:
Move next word from the buffer to the stack
(σ, wi | β, A) => (σ | wi, β, A)

• Left-Arc (for relation r):


Build an edge from the next word on the buffer to the top
word on the stack.
(σ | wi, wj | β, A) => (σ, wj | β, A ∪ {wj, r, wi})

• Right-Arc (for relation r)


Build an edge from the top word on the stack to the next word on the
buffer.

(σ | wi, wj | β, A) => (σ, wi | β, A ∪ {wi, r, wj})


"Arc-Eager" Transitions
• Shift:
Move next word from the buffer to the stack
(σ, wi | β, A) => (σ | wi, β, A)

• Left-Arc (for relation r):


Build an edge from the next word on the buffer to the top
word on the stack.
(σ | wi, wj | β, A) => (σ, wj | β, A ∪ {(wj, r, wi)})
Precondition: (wj, *, wi) is not yet in A.
• Right-Arc (for relation r)
Build an edge from the top word on the stack to the next word on the buffer.
(σ | wi, wj | β, A) => (σ | wi | wj, β , A ∪ {wi, r, wj})

• Reduce
Remove a completed node from the stack.
(σ | wi , β, A) => (σ, β , A)

Precondition: there is some (*, *, wi) in A.


Arc-Eager Example
next transition: RightArcpred Can immediately attach had to root.

A:

ATT SBJ

Economic news had

σ: root β: [ had, little, effect, on, financial, markets,.]


Arc-Eager Example
next transition: shift

PRED

A:

ATT SBJ

root Economic news had

had
σ: root β: [ little, effect, on, financial, markets,.]
Arc-Eager Example
next transition: LeftArcATT

PRED

A:

ATT SBJ ATT

root Economic news had little effect

little
had
σ: root β: [effect, on, financial, markets,.]
Arc-Eager Example
next transition: RightArcOBJ

PRED

A:
OBJ
ATT SBJ ATT

root Economic news had little effect

effect
little
had
σ: root β: [on, financial, markets,.]
Arc-Eager Example
next transition: shift

PRED

A:
OBJ
ATT SBJ ATT ATT

root Economic news had little effect on

on
effect
little
had
σ: root β: [financial, markets,.]
Arc-Eager Example
next transition: LeftArcATT

PRED

A:
OBJ
ATT SBJ ATT ATT

root Economic news had little effect on

financial
on
effect
little
had
σ: root β: [markets,.]
Arc-Eager Example
next transition: RightArcPC

PRED

A:
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little effect on financial markets

on
effect
little
had
σ: root β: [markets,.]
Arc-Eager Example
next transition: Reduce

PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little effect on financial markets

markets
on
effect
little
had
σ: root β: [.]
Arc-Eager Example
next transition: Reduce

PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little effect on financial markets

on
effect
little
had
σ: root β: [.]
Arc-Eager Example
next transition: Reduce

PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little effect on financial markets

effect
little
had
σ: root β: [.]
Arc-Eager Example
next transition: Reduce

PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little effect on financial markets

little
had
σ: root β: [.]
Arc-Eager Example
next transition: Reduce

PRED

A:
PC
OBJ
ATT SBJ ATT ATT ATT

root Economic news had little effect on financial markets

had
σ: root β: [.]
Graph-Based Approach
• Transition Based Parsing can only produce projective dependency
structures? Why?

• Graph-based approaches do not have this restriction.

• Basic idea:

• Each word is a vertex. Start with a completely connected graph.

• Use standard graph algorithms to compute a Maximum Spanning


Tree:

• Need a model that assigns a score to each edge ("edge-


factored model").

R. McDonald, K. Crammer, and F. Pereira (2005)


MST Example
root root
9 9
9 9
10 10

saw saw
30 30 30 30
20 0 20 0
11 11

John Mary John Mary


3 3

total score: 70
Computing the MST
• For undirected graphs, there are two common algorithms:

• Kruskal's and Prim's, both run in O(E log V)

• For dependency parsing we deal with directed graphs, so


these algorithms are not guaranteed to find a tree.

• Instead use Chu–Liu-Edmonds' algorithm, which runs in


O(EV) (naive implementation) or O(E log V) (with
optimizations).

You might also like