0% found this document useful (0 votes)

38 views45 pages

Natural Language Processing With Deep Learning CS224N/Ling284

This document summarizes a lecture on dependency parsing. It discusses two views of linguistic structure: constituency and dependency. Dependency structure shows binary asymmetric relations between words, forming a tree. It reviews the history of dependency grammar from Panini's Sanskrit grammar to modern work. The lecture also discusses dependency treebanks and sources of information for dependency parsing like bilexical affinities and dependency distance preferences.

Uploaded by

suman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views45 pages

Natural Language Processing With Deep Learning CS224N/Ling284

Uploaded by

suman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Natural Language Processing

with Deep Learning

CS224N/Ling284

Christopher Manning
Lecture 5: Dependency Parsing
Lecture Plan
Linguistic Structure: Dependency parsing
1. Syntactic Structure: Consistency and Dependency (25 mins)
2. Dependency Grammar and Treebanks (15 mins)
3. Transition-based dependency parsing (15 mins)
4. Neural dependency parsing (15 mins)

Reminders/comments:
Assignment 2 was due just before class J
Assignment 3 (dep parsing) is out today L
Start installing and learning PyTorch (Ass 3 has scaffolding)
Final project discussions – come meet with us; focus of week 5
1. Two views of linguistic structure:
Constituency = phrase structure grammar
= context-free grammars (CFGs)
Phrase structure organizes words into nested constituents

Starting unit: words

the, cat, cuddly, by, door

Words combine into phrases

the cuddly cat, by the door

Phrases can combine into bigger phrases

the cuddly cat by the door
1. Two views of linguistic structure:
Constituency = phrase structure grammar
= context-free grammars (CFGs)
Phrase structure organizes words into nested constituents
Can represent the grammar with CFG rules

Starting unit: words are given a category (part of speech = pos)

the, cat, cuddly, by, door
Det N Adj P N

Words combine into phrases with categories

the cuddly cat, by the door
NP → Det Adj N PP → P NP

Phrases can combine into bigger phrases recursively

the cuddly cat by the door
NP → NP PP
Two views of linguistic structure:
Constituency = phrase structure grammar
= context-free grammars (CFGs)
Phrase structure organizes words into nested constituents.

the cat
a dog
large in a crate
barking on the table
cuddly by the door
large barking
talk to
walked behind
Two views of linguistic structure:
Dependency structure
• Dependency structure shows which words depend on (modify or
are arguments of) which other words.

Look in the large crate in the kitchen by the door

Why do we need sentence structure?

We need to understand sentence structure in

order to be able to interpret language correctly

Humans communicate complex ideas by

composing words together into bigger units to
convey complex meanings

We need to know what is connected to what

Prepositional phrase attachment ambiguity
Prepositional phrase attachment
ambiguity

✓
Scientists count whales from space

Scientists count whales from space

PP attachment ambiguities multiply
• A key parsing decision is how we ‘attach’ various constituents
• PPs, adverbial or participial phrases, infinitives, coordinations,
etc.

• Catalan numbers: Cn = (2n)!/[(n+1)!n!]

• An exponentially growing series, which arises in many tree-like contexts:
• E.g., the number of possible triangulations of a polygon with n+2 sides
• Turns up in triangulation of probabilistic graphical models (CS228)….
Coordination scope ambiguity

Shuttle veteran and longtime NASA executive Fred Gregory appointed to board

Shuttle veteran and longtime NASA executive Fred Gregory appointed to board
Coordination scope ambiguity
Adjectival Modifier Ambiguity
Verb Phrase (VP) attachment ambiguity
Dependency paths identify semantic
relations – e.g., for protein interaction
[Erkan et al. EMNLP 07, Fundel et al. 2007, etc.]

demonstrated
nsubj ccomp

results mark interacts nmod:with

det
that advmod SasA
nsubj case conj:and
The
KaiC rythmically with KaiA and KaiB
conj:and cc
KaiC çnsubj interacts nmod:with è SasA
KaiC çnsubj interacts nmod:with è SasA conj:andè KaiA
KaiC çnsubj interacts nmod:with è SasA conj:andè KaiB
Christopher Manning

2. Dependency Grammar and

Dependency Structure

Dependency syntax postulates that syntactic structure consists of

relations between lexical items, normally binary asymmetric
relations (“arrows”) called dependencies
submitted

Bills were Brownback

ports
by Senator Republican

on and immigration
Kansas

of
Christopher Manning

Dependency Grammar and

Dependency Structure

Dependency syntax postulates that syntactic structure consists of

relations between lexical items, normally binary asymmetric
relations (“arrows”) called dependencies
submitted
nsubj:pass aux obl
The arrows are Bills were Brownback
commonly typed nmod
case
ports appos
with the name of flat
grammatical case cc conj by Senator Republican
relations (subject, on and immigration nmod
prepositional object, Kansas
apposition, etc.) case
of
Christopher Manning

Dependency Grammar and

Dependency Structure

Dependency syntax postulates that syntactic structure consists of

relations between lexical items, normally binary asymmetric
relations (“arrows”) called dependencies
submitted
The arrow connects a nsubj:pass aux obl
head (governor,
Bills were Brownback
superior, regent) with a nmod
dependent (modifier, case appos
ports flat
inferior, subordinate)
case cc conj by Senator Republican
Usually, dependencies on and immigration nmod
form a tree (connected, Kansas
acyclic, single-head) case
of
Christopher Manning

Pāṇini’s grammar
(c. 5th century BCE)

Gallery: http://wellcomeimages.org/indexplus/image/L0032691.html
CC BY 4.0 File:Birch bark MS from Kashmir of the Rupavatra Wellcome L0032691.jpg
24
Christopher Manning

Dependency Grammar/Parsing History

• The idea of dependency structure goes back a long way

• To Pāṇini’s grammar (c. 5th century BCE)
• Basic approach of 1st millennium Arabic grammarians
• Constituency/context-free grammars is a new-fangled invention
• 20th century invention (R.S. Wells, 1947; then Chomsky)
• Modern dependency work often sourced to L. Tesnière (1959)
• Was dominant approach in “East” in 20th Century (Russia, China, …)
• Good for free-er word order languages
• Among the earliest kinds of parsers in NLP, even in the US:
• David Hays, one of the founders of U.S. computational linguistics, built
early (first?) dependency parser (Hays 1962)
Christopher Manning

Dependency Grammar and

Dependency Structure

ROOT Discussion of the outstanding issues was completed .

• Some people draw the arrows one way; some the other way!
• Tesnière had them point from head to dependent…
• Usually add a fake ROOT so every word is a dependent of
precisely 1 other node
Christopher Manning

The rise of annotated data:

Universal Dependencies treebanks
[Universal Dependencies: http://universaldependencies.org/ ;
Earlier: Marcus et al. 1993, The Penn Treebank, Computational Linguistics]
Christopher Manning

The rise of annotated data

Starting off, building a treebank seems a lot slower and less useful
than building a grammar

But a treebank gives us many things

• Reusability of the labor
• Many parsers, part-of-speech taggers, etc. can be built on it
• Valuable resource for linguistics
• Broad coverage, not just a few intuitions
• Frequencies and distributional information
• A way to evaluate systems
Christopher Manning

Dependency Conditioning Preferences

What are the sources of information for dependency parsing?

1. Bilexical affinities [discussion à issues] is plausible
2. Dependency distance mostly with nearby words
3. Intervening material
Dependencies rarely span intervening verbs or punctuation
4. Valency of heads
How many dependents on which side are usual for a head?

ROOT Discussion of the outstanding issues was completed .

Christopher Manning

Dependency Parsing

• A sentence is parsed by choosing for each word what other

word (including ROOT) is it a dependent of

• Usually some constraints:

• Only one word is a dependent of ROOT
• Don’t want cycles A → B, B → A
• This makes the dependencies a tree
• Final issue is whether arrows can cross (non-projective) or not

ROOT I ’ll give a talk tomorrow on bootstrapping

30
Christopher Manning

Projectivity

• Defn: There are no crossing dependency arcs when the words

are laid out in their linear order, with all arcs above the words
• Dependencies parallel to a CFG tree must be projective
• Forming dependencies by taking 1 child of each category as head
• But dependency theory normally does allow non-projective
structures to account for displaced constituents
• You can’t easily get the semantics of certain constructions right without
these nonprojective dependencies

Who did Bill buy the coffee from yesterday ?

Christopher Manning

Methods of Dependency Parsing

1. Dynamic programming
Eisner (1996) gives a clever algorithm with complexity O(n3), by producing parse
items with heads at the ends rather than in the middle
2. Graph algorithms
You create a Minimum Spanning Tree for a sentence
McDonald et al.’s (2005) MSTParser scores dependencies independently using an
ML classifier (he uses MIRA, for online learning, but it can be something else)
Neural graph-based parser: Dozat and Manning (2017)
3. Constraint Satisfaction
Edges are eliminated that don’t satisfy hard constraints. Karlsson (1990), etc.
4. “Transition-based parsing” or “deterministic dependency parsing”
Greedy choice of attachments guided by good machine learning classifiers
E.g., MaltParser (Nivre et al. 2008). Has proven highly effective.
Christopher Manning

3. Greedy transition-based parsing

[Nivre 2003]

• A simple form of greedy discriminative dependency parser

• The parser does a sequence of bottom up actions
• Roughly like “shift” or “reduce” in a shift-reduce parser, but the “reduce”
actions are specialized to create dependencies with head on left or right
• The parser has:
• a stack σ, written with top to the right
• which starts with the ROOT symbol
• a buffer β, written with top to the left
• which starts with the input sentence
• a set of dependency arcs A
• which starts off empty
• a set of actions
Christopher Manning

Basic transition-based dependency parser

Start: σ = [ROOT], β = w1, …, wn , A = ∅

1. Shift σ, wi|β, A è σ|wi, β, A
2. Left-Arcr σ|wi|wj, β, A è σ|wj, β, A∪{r(wj,wi)}
3. Right-Arcr σ|wi|wj, β, A è σ|wi, β, A∪{r(wi,wj)}
Finish: σ = [w], β = ∅
Christopher Manning

Arc-standard transition-based parser

(there are other transition schemes …)
Analysis of “I ate fish”
Start Start: σ = [ROOT], β = w1, …, wn , A = ∅
1. Shift σ, wi|β, A è σ|wi, β, A
2. Left-Arcr σ|wi|wj, β, A è
[root] I ate fish σ|wj, β, A∪{r(wj,wi)}
3. Right-Arcr σ|wi|wj, β, A è
σ|wi, β, A∪{r(wi,wj)}
Finish: σ = [w], β = ∅
Shift

[root] I ate fish

Shift

[root] I ate fish

Christopher Manning

Arc-standard transition-based parser

Analysis of “I ate fish”
Left Arc
A +=
[root] I ate [root] ate nsubj(ate → I)

Shift

[root] ate fish [root] ate fish

Right Arc
A +=
[root] ate fish [root] ate obj(ate → fish)

Right Arc
A +=
[root] ate [root] root([root] → ate)
Finish
Christopher Manning

MaltParser
[Nivre and Hall 2005]

• We have left to explain how we choose the next action

• Answer: Stand back, I know machine learning!
• Each action is predicted by a discriminative classifier (e.g.,
softmax classifier) over each legal move
• Max of 3 untyped choices; max of |R| × 2 + 1 when typed
• Features: top of stack word, POS; first in buffer word, POS; etc.
• There is NO search (in the simplest form)
• But you can profitably do a beam search if you wish (slower but better):
You keep k good parse prefixes at each time step
• The model’s accuracy is fractionally below the state of the art in
dependency parsing, but
• It provides very fast linear time parsing, with great performance
Christopher Manning

Conventional Feature Representation

binary, sparse 0 0 0 1 0 0 1 0 …0 0 1 0
dim =106 ~ 107
Feature templates: usually a
combination of 1 ~ 3 elements from
the configuration.

Indicator features
Christopher Manning

Evaluation of Dependency Parsing:

(labeled) dependency accuracy
Acc = # correct deps
# of deps

UAS = 4 / 5 = 80%
ROOT She saw the video lecture
LAS = 2 / 5 = 40%
0 1 2 3 4 5

Gold Parsed
1 2 She nsubj 1 2 She nsubj
2 0 saw root 2 0 saw root
3 5 the det 3 4 the det
4 5 video nn 4 5 video nsubj
5 2 lecture obj 5 2 lecture ccomp
Christopher Manning

Handling non-projectivity

• The arc-standard algorithm we presented only builds projective

dependency trees
• Possible directions to head:
1. Just declare defeat on nonprojective arcs
2. Use dependency formalism which only has projective representations
• CFG only allows projective structures; you promote head of violations
3. Use a postprocessor to a projective dependency parsing algorithm to
identify and resolve nonprojective links
4. Add extra transitions that can model at least most non-projective
structures (e.g., add an extra SWAP transition, cf. bubble sort)
5. Move to a parsing mechanism that does not use or require any
constraints on projectivity (e.g., the graph-based MSTParser)
Christopher Manning

4. Why train a neural dependency

parser? Indicator Features Revisited

• Problem #1: sparse

• Problem #2: incomplete
• Problem #3: expensive computation
dense 0.1 0.9 -0.2 0.3 … -0.1 -0.5
dim = ~1000
More than 95% of parsing time is consumed by
feature computation.

Our Approach:
learn a dense and compact feature representation
Christopher Manning

A neural dependency parser

[Chen and Manning 2014]

• English parsing to Stanford Dependencies:

• Unlabeled attachment score (UAS) = head
• Labeled attachment score (LAS) = head and label

Parser UAS LAS sent. / s

MaltParser 89.8 87.2 469
MSTParser 91.4 88.1 10
TurboParser 92.3 89.6 8
C & M 2014 92.0 89.7 654
Christopher Manning

Distributed Representations

• We represent each word as a d-dimensional dense vector

(i.e., word embedding)
• Similar words are expected to have close vectors.

• Meanwhile, part-of-speech tags (POS)

was and dependency labels
were
are also represented as d-dimensional vectors.
• The smaller discrete sets also exhibit many semantical similarities.
good
is
come

NNS (plural noun) should be close to NN (singular noun).

num (numerical modifier) should be close to amod (adjective modifier).

Christopher Manning

Extracting Tokens and then vector

representations from configuration
• We extract a set of tokens based on the stack / buffer positions:

word POS dep.

s1 good JJ ∅
s2 has VBZ ∅
b1 control NN ∅
lc(s1) ∅ + ∅ + ∅
rc(s1) ∅ ∅ ∅
lc(s2) He PRP nsubj
rc(s2) ∅ ∅ ∅
Christopher Manning

Model Architecture
Softmax probabilities
Output layer y cross-entropy error will be
y = softmax(Uh + b2) back-propagated to the
embeddings.
Hidden layer h
h = ReLU(Wx + b1)

Input layer x
lookup + concat
Dependency parsing for sentence structure

Neural networks can accurately determine the

structure of sentences, supporting interpretation

Chen and Manning (2014) was the first simple,

successful neural dependency parser

The dense representations let it outperform other

greedy parsers in both accuracy and speed
Further developments in transition-based
neural dependency parsing

This work was further developed and improved by others,

including in particular at Google
• Bigger, deeper networks with better tuned hyperparameters
• Beam search
• Global, conditional random field (CRF)-style inference over
the decision sequence
Leading to SyntaxNet and the Parsey McParseFace model
https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html

Method UAS LAS (PTB WSJ SD 3.3)

Chen & Manning 2014 92.0 89.7
Weiss et al. 2015 93.99 92.05
Andor et al. 2016 94.61 92.79
Graph-based dependency parsers

• Compute a score for every possible dependency for each word

• Doing this well requires good “contextual” representations of
each word token, which we will develop in coming lectures

0.5 0.8

0.3 2.0

ROOT The big cat sat

e.g., picking the head for “big”

Graph-based dependency parsers

• Compute a score for every possible dependency for each word

• Then add an edge from each word to its highest-scoring
candidate head
• And repeat the same process for each other word
0.5 0.8

0.3 2.0

ROOT The big cat sat

e.g., picking the head for “big”

A Neural graph-based dependency parser
[Dozat and Manning 2017; Dozat, Qi, and Manning 2017]
• Revived graph-based dependency parsing in a neural world
• Design a biaffine scoring model for neural dependency
parsing
• Also using a neural sequence model, as we discuss next week
• Really great results!
• But slower than simple neural transition-based parsers
• There are n2 possible dependencies in a sentence of length n

Method UAS LAS (PTB WSJ SD 3.3

Chen & Manning 2014 92.0 89.7
Weiss et al. 2015 93.99 92.05
Andor et al. 2016 94.61 92.79
Dozat & Manning 2017 95.74 94.08

Brassey's Air Combat Reader
No ratings yet
Brassey's Air Combat Reader
356 pages
Unit 2
No ratings yet
Unit 2
140 pages
Cs224n 2025 Lecture04 Dep Parsing
No ratings yet
Cs224n 2025 Lecture04 Dep Parsing
53 pages
MNLP Unit-2 (1)
No ratings yet
MNLP Unit-2 (1)
96 pages
Recent Advances in Dependency Parsing (Slides 2014) - Ryan McDonald Joakim Nivre PDF
No ratings yet
Recent Advances in Dependency Parsing (Slides 2014) - Ryan McDonald Joakim Nivre PDF
379 pages
UNIT III_NLP
No ratings yet
UNIT III_NLP
36 pages
Constituency and Dependency in Syntax
No ratings yet
Constituency and Dependency in Syntax
7 pages
NLP Unit 2
No ratings yet
NLP Unit 2
20 pages
Dependency Parsing
No ratings yet
Dependency Parsing
377 pages
Natural Language Processing Unit 3 (1)
No ratings yet
Natural Language Processing Unit 3 (1)
55 pages
Lecture08 Dependency Parsing
No ratings yet
Lecture08 Dependency Parsing
70 pages
Lecture 08
No ratings yet
Lecture 08
69 pages
Module 3 NLP
No ratings yet
Module 3 NLP
32 pages
CITS4012 Lecture02 PDF
No ratings yet
CITS4012 Lecture02 PDF
54 pages
6752-NLP
No ratings yet
6752-NLP
14 pages
GENERAC 5000 Watt Generator
100% (2)
GENERAC 5000 Watt Generator
16 pages
Syntactic Analysis
No ratings yet
Syntactic Analysis
66 pages
Dependency Parsing And Algorithms With Images
No ratings yet
Dependency Parsing And Algorithms With Images
13 pages
Dependancy Parsing
No ratings yet
Dependancy Parsing
20 pages
Dependency parsing
No ratings yet
Dependency parsing
32 pages
NLP UNIT-II PPT
No ratings yet
NLP UNIT-II PPT
45 pages
Dependency Parsing
No ratings yet
Dependency Parsing
51 pages
NLP CHAPTER 3
No ratings yet
NLP CHAPTER 3
23 pages
Unit 2 new one
No ratings yet
Unit 2 new one
12 pages
nlp-2024-404
No ratings yet
nlp-2024-404
13 pages
8 Parsing
No ratings yet
8 Parsing
40 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
118 pages
Unit Iii
No ratings yet
Unit Iii
17 pages
Mod - 3 (2)
No ratings yet
Mod - 3 (2)
51 pages
Dependency Grammars: Julia Hockenmaier
No ratings yet
Dependency Grammars: Julia Hockenmaier
21 pages
18 Jurafsky Jurafsky
No ratings yet
18 Jurafsky Jurafsky
26 pages
NLP UNIT-II
No ratings yet
NLP UNIT-II
42 pages
NLP UNIT-II
No ratings yet
NLP UNIT-II
71 pages
18-Graph Based Dependency Parsing-19-09-2024
No ratings yet
18-Graph Based Dependency Parsing-19-09-2024
19 pages
Unit II
No ratings yet
Unit II
61 pages
4.Chapter5_ Syntactic and Semantic Representations
No ratings yet
4.Chapter5_ Syntactic and Semantic Representations
47 pages
c
No ratings yet
c
54 pages
Phrase Structure Rules, Tree Rewriting, and Recursion Hierarchical Structure: Complements and Adjuncts 1 Trees
No ratings yet
Phrase Structure Rules, Tree Rewriting, and Recursion Hierarchical Structure: Complements and Adjuncts 1 Trees
21 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
47 pages
17-Transition Based Dependency Parsing-13-09-2024
No ratings yet
17-Transition Based Dependency Parsing-13-09-2024
25 pages
dependency grammar
No ratings yet
dependency grammar
10 pages
Dependency Grammar: NASSLLI Short Course On Dependency Parsing Summer 2010
No ratings yet
Dependency Grammar: NASSLLI Short Course On Dependency Parsing Summer 2010
51 pages
Unit 5
No ratings yet
Unit 5
10 pages
Parsing Dependency
No ratings yet
Parsing Dependency
26 pages
What Is Parsing
No ratings yet
What Is Parsing
47 pages
cs224n 2023 Lecture04 Dep Parsing
No ratings yet
cs224n 2023 Lecture04 Dep Parsing
45 pages
nlp unit 2
No ratings yet
nlp unit 2
13 pages
Dependency Parsing
No ratings yet
Dependency Parsing
27 pages
14-syntax-1
No ratings yet
14-syntax-1
22 pages
Unit 2_Lecture 1
No ratings yet
Unit 2_Lecture 1
19 pages
SLoSP 2007 1
No ratings yet
SLoSP 2007 1
42 pages
601 f08 l2
No ratings yet
601 f08 l2
20 pages
NLP unit-2
No ratings yet
NLP unit-2
18 pages
Dependency Parsing
No ratings yet
Dependency Parsing
96 pages
2 Coating-Failure-Defects
100% (1)
2 Coating-Failure-Defects
46 pages
cs224n 2019 Notes04 Dependencyparsing
No ratings yet
cs224n 2019 Notes04 Dependencyparsing
5 pages
Abebe Final Ppt
No ratings yet
Abebe Final Ppt
52 pages
Machine 22
No ratings yet
Machine 22
5 pages
Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?
No ratings yet
Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?
32 pages
Introduction to Statistics-Study Material
No ratings yet
Introduction to Statistics-Study Material
9 pages
CS224n: Natural Language Processing With Deep Learning: Lecture Notes: Part IV Dependency Parsing Winter 2019
No ratings yet
CS224n: Natural Language Processing With Deep Learning: Lecture Notes: Part IV Dependency Parsing Winter 2019
5 pages
Urdu Dependency Parser A Data-Driven Approach
No ratings yet
Urdu Dependency Parser A Data-Driven Approach
7 pages
ORGANIZATIONAL BEHAVIOUR AND PERFORMANCE - JUNE 2024 PAST QUESTION - PE 1
No ratings yet
ORGANIZATIONAL BEHAVIOUR AND PERFORMANCE - JUNE 2024 PAST QUESTION - PE 1
20 pages
Intern Report
100% (1)
Intern Report
57 pages
Adaptive Trpo
No ratings yet
Adaptive Trpo
59 pages
Math 4 Week 5 Matatag q1
No ratings yet
Math 4 Week 5 Matatag q1
134 pages
RJ Industries Solar Energy_2025_compressed
No ratings yet
RJ Industries Solar Energy_2025_compressed
15 pages
Dependency Parsing
100% (11)
Dependency Parsing
127 pages
Report Lpco BNK Impr Perm
100% (2)
Report Lpco BNK Impr Perm
6 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
Homework Construction LTD
100% (1)
Homework Construction LTD
7 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
36 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
34 pages
Analog Solutions For Xilinx Fpgas: 1st Edition
No ratings yet
Analog Solutions For Xilinx Fpgas: 1st Edition
36 pages
Central Reservation Systems
100% (1)
Central Reservation Systems
32 pages
Axial Capacities of Eccentrically Loaded Equal-Leg Single Angles - Comparisons of Various Design Methods
100% (1)
Axial Capacities of Eccentrically Loaded Equal-Leg Single Angles - Comparisons of Various Design Methods
38 pages
EC8691 Microprocessors and Microcontroll
No ratings yet
EC8691 Microprocessors and Microcontroll
12 pages
Model Ensemble Trpo
No ratings yet
Model Ensemble Trpo
15 pages
Joe Structure
No ratings yet
Joe Structure
7 pages
Constrained Policy Opt
No ratings yet
Constrained Policy Opt
18 pages
As NZ 3191-2008
No ratings yet
As NZ 3191-2008
7 pages
Quasi Newton Trpo
No ratings yet
Quasi Newton Trpo
10 pages
Dll-Eapp 2
No ratings yet
Dll-Eapp 2
4 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
57 pages
D13IE010EN T PDL Xsample 122
No ratings yet
D13IE010EN T PDL Xsample 122
10 pages
Grounding Graph
No ratings yet
Grounding Graph
8 pages
Persuasion Matrix
No ratings yet
Persuasion Matrix
8 pages
Your Business Advantage Checking Bus Platinum Privileges: Account Summary
100% (1)
Your Business Advantage Checking Bus Platinum Privileges: Account Summary
6 pages
Analysis of Laterally Loaded Piles Based On P-Y Procedures
No ratings yet
Analysis of Laterally Loaded Piles Based On P-Y Procedures
25 pages
Ayushi Resume
No ratings yet
Ayushi Resume
3 pages
Aerogen_Direct_Parts_List_US1241B
No ratings yet
Aerogen_Direct_Parts_List_US1241B
2 pages
Tensioners Warper: General Specifications
No ratings yet
Tensioners Warper: General Specifications
2 pages
Different Types of Museums
No ratings yet
Different Types of Museums
7 pages
Engineering Economics, ENGR 610: Quiz-3&4, Take Home (15%)
No ratings yet
Engineering Economics, ENGR 610: Quiz-3&4, Take Home (15%)
2 pages
Principle of The Ecu Chiptuning
100% (1)
Principle of The Ecu Chiptuning
4 pages
Lesson 3 - Membrane-Bound Organelles
No ratings yet
Lesson 3 - Membrane-Bound Organelles
3 pages
Upgrade Option #2 - HPR130 / HPR260: Benefits Include
No ratings yet
Upgrade Option #2 - HPR130 / HPR260: Benefits Include
1 page
Principles of Argument Structure: A Merge-Based Approach
From Everand
Principles of Argument Structure: A Merge-Based Approach
Chris Collins
No ratings yet

Natural Language Processing With Deep Learning CS224N/Ling284

Uploaded by

Natural Language Processing With Deep Learning CS224N/Ling284

Uploaded by

Natural Language Processing

with Deep Learning

Starting unit: words

Words combine into phrases

Phrases can combine into bigger phrases

Starting unit: words are given a category (part of speech = pos)

Words combine into phrases with categories

Phrases can combine into bigger phrases recursively

Look in the large crate in the kitchen by the door

We need to understand sentence structure in

Humans communicate complex ideas by

We need to know what is connected to what

Scientists count whales from space

• Catalan numbers: Cn = (2n)!/[(n+1)!n!]

results mark interacts nmod:with

2. Dependency Grammar and

Dependency syntax postulates that syntactic structure consists of

Bills were Brownback

Dependency Grammar and

Dependency syntax postulates that syntactic structure consists of

Dependency Grammar and

Dependency syntax postulates that syntactic structure consists of

Dependency Grammar/Parsing History

• The idea of dependency structure goes back a long way

Dependency Grammar and

ROOT Discussion of the outstanding issues was completed .

The rise of annotated data:

The rise of annotated data

But a treebank gives us many things

Dependency Conditioning Preferences

What are the sources of information for dependency parsing?

ROOT Discussion of the outstanding issues was completed .

• A sentence is parsed by choosing for each word what other

• Usually some constraints:

ROOT I ’ll give a talk tomorrow on bootstrapping

• Defn: There are no crossing dependency arcs when the words

Who did Bill buy the coffee from yesterday ?

Methods of Dependency Parsing

3. Greedy transition-based parsing

• A simple form of greedy discriminative dependency parser

Basic transition-based dependency parser

Start: σ = [ROOT], β = w1, …, wn , A = ∅

Arc-standard transition-based parser

[root] I ate fish

[root] I ate fish

Arc-standard transition-based parser

[root] ate fish [root] ate fish

• We have left to explain how we choose the next action

Conventional Feature Representation

Evaluation of Dependency Parsing:

• The arc-standard algorithm we presented only builds projective

4. Why train a neural dependency

• Problem #1: sparse

A neural dependency parser

• English parsing to Stanford Dependencies:

Parser UAS LAS sent. / s

• We represent each word as a d-dimensional dense vector

• Meanwhile, part-of-speech tags (POS)

NNS (plural noun) should be close to NN (singular noun).

num (numerical modifier) should be close to amod (adjective modifier).

Extracting Tokens and then vector

word POS dep.

Neural networks can accurately determine the

Chen and Manning (2014) was the first simple,

The dense representations let it outperform other

This work was further developed and improved by others,

Method UAS LAS (PTB WSJ SD 3.3)

• Compute a score for every possible dependency for each word

ROOT The big cat sat

e.g., picking the head for “big”

• Compute a score for every possible dependency for each word

ROOT The big cat sat

e.g., picking the head for “big”

Method UAS LAS (PTB WSJ SD 3.3

You might also like