0% found this document useful (0 votes)

14 views18 pages

NLP 2024 406

The document discusses advancements in neural architectures for dependency parsing, highlighting the transition from linear models to neural networks, specifically focusing on the work of Chen and Manning (2014) and Kiperwasser and Goldberg (2016). It details the training methodologies, feature embeddings, and parsing accuracy improvements achieved through the use of feedforward networks and Bi-LSTM architectures. Additionally, Glavaš and Vulić (2021) are mentioned for their use of BERT encoders, further enhancing parsing performance.

Uploaded by

Daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views18 pages

NLP 2024 406

Uploaded by

Daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Natural Language Processing

Neural architectures for dependency parsing

Marco Kuhlmann
Department of Computer and Information Science

This work is licensed under a

Creative Commons Attribution 4.0 International License.
Learning problems in dependency parsing

• Learning a greedy transition-based dependency parser

amounts to learning the transition classi er.
Chen and Manning ( ), Kiperwasser and Goldberg ( )

• Learning an arc-factored graph-based dependency parser

amounts to learning the arc scores.
Kiperwasser and Goldberg ( ), Glavaš and Vulić ( )
2
0
1
4
2
0
1
6
fi
2
0
2
2
1
0
1
6
Chen and Manning (2014)

• Pre-neural transition classi ers relied on linear

models with hand-cra ed combination features.
Linear

• C & M propose to replace the linear model with ReLU

a two-layer feedforward network (FNN).
Linear

• e standard choice for the transfer function is

the recti ed linear unit (ReLU).
feedforward
C & M use the cube function, ( ) = 3.
neural network
Th
𝒚𝒙
fi
ft
𝑓
fi
𝑥
𝑥
I wanted to try someplace new

wanted to try someplace new

stack buffer

softmax scores for the transitions

FNN

concat

Embed Embed Embed

to try someplace

stack 2 stack 1 bu er 1
ff
I wanted to try someplace new

wanted try someplace new

stack buffer

softmax scores for the transitions

FNN

concat

Embed Embed Embed

wanted try someplace

stack 2 stack 1 bu er 1
ff
Chen and Manning (2014) – Features

• C & M embed the top words on the stack and bu er, as well as
certain descendants of the top words on the stack.

• In addition to word embeddings, they also use embeddings for

part-of-speech tags and dependency labels.
3
ff
Chen and Manning (2014) – Training

• To train their parser, C & M minimise cross-entropy loss relative

to the gold-standard action, plus an L regularisation term.

• To generate training examples for the transition classi er, they

use the static oracle for the arc-standard algorithm.
can be generated o -line
ff
2
fi
Parsing accuracy

UAS LAS

Baseline, transition-based . .

Baseline, graph-based . .

Chen and Manning ( ) . .

Weiss et al. ( ) . .

Parsing accuracy on the test set of the Penn Treebank + conversion to Stanford dependencies
8
8
9
8
9
8
9
9
9
7
0
7
3
1
1
9
4
3
7
6
2
2
8
6
2
0
1
5
2
0
1
4
Kiperwasser and Goldberg (2016)

• e neural parser of C & M learns useful feature combinations,

but the need to carefully design the core features remains.

• K & G propose to use a minimal set of core features based on

contextualised embeddings obtained from a Bi-LSTM.
Bi-LSTM is trained with the rest of the parser.

• ey show that this approach gives state-of-the-art accuracy both

for transition-based and for graph-based parsing.
Th
Th
I wanted to try someplace new

wanted to try someplace new

stack buffer

FNN scores for the transitions

concat

1 2 3 4 5 6

Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM

Embed Embed Embed Embed Embed Embed

I wanted to try someplace new

stack 2 stack 1 bu er 1
𝒗
𝒗
𝒗
𝒗
𝒗
𝒗
ff
I wanted to try someplace new

wanted try someplace new

stack buffer

FNN scores for the transitions

concat

1 2 3 4 5 6

Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM

Embed Embed Embed Embed Embed Embed

I wanted to try someplace new

stack 2 stack 1 bu er 1
𝒗
𝒗
𝒗
𝒗
𝒗
𝒗
ff
Features and training (transition-based parser)

• For their transition-based parser, K & G embed the top words

on the stack, as well as the rst word in the bu er.
both word and part-of-speech tag

• In contrast to C & M, they use a dynamic oracle, so they cannot

generate training examples in an o -line fashion.
fi
ff
ff
3
I wanted to try someplace new

FNN score for the arc

concat

1 2 3 4 5 6

Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM

Embed Embed Embed Embed Embed Embed

I wanted to try someplace new

dependent head
𝒗
𝒗
𝒗
𝒗
𝒗
𝒗
I wanted to try someplace new

FNN score for the arc

concat

1 2 3 4 5 6

Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM

Embed Embed Embed Embed Embed Embed

I wanted to try someplace new

head dependent
𝒗
𝒗
𝒗
𝒗
𝒗
𝒗
Features and training (graph-based parser)

• For their graph-based parser, K & G embed the head and

dependent of each arc.
both word and part-of-speech tag

• e training objective is to maximise the margin between the

score of the gold tree * and the highest-scoring incorrect tree :
Th
𝑦
𝑦
Parsing accuracy

UAS LAS

Chen and Manning ( ) . .

Weiss et al. ( ) . .

K&G( ), graph-based . .

K&G( ), transition-based . .

Parsing accuracy on the test set of the Penn Treebank + conversion to Stanford dependencies
9
8
9
9
9
9
9
9
1
9
3
1
3
0
3
1
8
6
2
2
0
9
6
5
2
2
0
0
1
1
6
6
2
0
1
5
2
0
1
4
Glavaš and Vulić (2021)

• G & V adopt the basic architecture of K & G but use a BERT

encoder instead of a Bi-LSTM.
requires word-level average pooling of token representations

• e arc scores are computed using a bi-a ne layer:

score(𝑥, 𝑖 → 𝑗) = 𝒘𝑖 𝑾 𝒘𝑗⊤ + 𝒃
<latexit sha1_base64="WKW75ax5UuKj4l+rgeP0iwRzsII=">AAAFkXicjVRtT9swEA4b3Vj3BuPjvlirkGBLq6aglk6qhAChTVvXjPJSqekqJzmKIbGz2Cl0UX7Wfsy0r9v/mN1mQFOQsNTGfu65x+c7n+3AI1yUy7/mHjyczz16vPAk//TZ8xcvF5deHXEWhQ4cOsxjYcfGHDxC4VAQ4UEnCAH7tgfH9vmOsh8PIeSE0QMxCqDn4wElJ8TBQkL9xZYl4FLE3GEhJKuXOiLIEgydrSFLb1g6sobgxBdJX8Lp4jhBN/Czb5IeoHcTwE76i4VyqTweaHZipJOClg6zvzT/1nKZE/lAheNhzrtGORC9GIeCOB4keSviEGDnHA+gK6cU+8B78fjkCVqRiItOWCh/VKAxetMjxj73sTiVTPXheWsfvkckBDO1KzcegJPks6auzTy3yMXIg8bHdktX/tfLXhxR4jAXimP5vMVB+JhQpdfNI9QmP2APsIhC4KiBYgkhFCtUroqbpQ0dtQJ5Quyl2GaiT3EkpWisZ1mGkaEZ60WjXqrP8OpZniQVs6xaJZGkMfMzGVwFewAdBX2JfFveHBV9k1HGZUrBlRKe21ZZkG69uEmovEbIDNn/FIjT+6WgqqM9VbIGmmg0VXoPCB3paCyvDBJqOyEJxOQ/c6TqJI2zKjs4ELep3JHhWYF9GEQeDu+hcZX+WZF2ZJ/eR0EVpla5TWGX8CDDrlWKdzFv3UuV97pMzclV9UCobm1YfOTbJ1PtouonGPO4pAUhGxIXHOb7mLrWgAyBxhLfBdmnqk1kr7i78uXxiYCwY5qWud/a7hq92ExW1+IkXpG7WyFQuJjWsCijfByf9d5ylQCXp5MhhWLKpIJhgdpUnaNgJHn5uhjZt2R2clQpGdVS9etGYWs7fWcWtNfaG21VM7SatqV90EztUHO0n9pv7Y/2N7ecq+e2cin3wVzqs6xNjdynf6nquqU=</latexit>
Th
ffi
I liked the place

Bia ne score for the arc liked → I

1 2

0 21 22 3 4
Pooling mean

BERT FFN FFN FFN FFN FFN FFN

MHA MHA MHA MHA MHA MHA

[CLS] I lik ##ed the place

dependent head
𝒗
𝒗
𝒗
𝒗
𝒗
𝒗
𝒗
ffi

Jaf 2015 PHD Thesis
No ratings yet
Jaf 2015 PHD Thesis
191 pages
Unit III 1
No ratings yet
Unit III 1
11 pages
Dependency Parsing
No ratings yet
Dependency Parsing
34 pages
Dependency Parsing 2: CMSC 723 / LING 723 / INST 725
No ratings yet
Dependency Parsing 2: CMSC 723 / LING 723 / INST 725
52 pages
17-Transition Based Dependency Parsing-13-09-2024
No ratings yet
17-Transition Based Dependency Parsing-13-09-2024
25 pages
18-Graph Based Dependency Parsing-19-09-2024
No ratings yet
18-Graph Based Dependency Parsing-19-09-2024
19 pages
NLP 2024 405
No ratings yet
NLP 2024 405
25 pages
Siva PHD Thesis
No ratings yet
Siva PHD Thesis
173 pages
A Survey On Semantic Parsing
No ratings yet
A Survey On Semantic Parsing
22 pages
2021 Tacl-1 8
No ratings yet
2021 Tacl-1 8
19 pages
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
No ratings yet
Christopher Manning Lecture 5: Language Models and Recurrent Neural Networks (Oh, and Finish Neural Dependency Parsing J)
66 pages
Kiperwasser 16
No ratings yet
Kiperwasser 16
16 pages
Dependency Transformer Grammars: Integrating Dependency Structures Into Transformer Language Models
No ratings yet
Dependency Transformer Grammars: Integrating Dependency Structures Into Transformer Language Models
14 pages
2305 19523
No ratings yet
2305 19523
22 pages
Chap 7.1 Sequence Analysis Using FFN
No ratings yet
Chap 7.1 Sequence Analysis Using FFN
47 pages
Sanskrit Dependency Parsing
No ratings yet
Sanskrit Dependency Parsing
20 pages
Collobert 11 A
No ratings yet
Collobert 11 A
9 pages
A3 Handout
No ratings yet
A3 Handout
8 pages
NLP 2024 404
No ratings yet
NLP 2024 404
13 pages
Grammar As A Foreign Language: Equal Contribution
No ratings yet
Grammar As A Foreign Language: Equal Contribution
10 pages
Efficient Second-Order TreeCRF For Neural Dependency Parsing
No ratings yet
Efficient Second-Order TreeCRF For Neural Dependency Parsing
11 pages
Abstract Syntax Networks For Code Generation and Semantic Parsing
No ratings yet
Abstract Syntax Networks For Code Generation and Semantic Parsing
11 pages
Tacl A 00109
No ratings yet
Tacl A 00109
14 pages
17489-Article Text-20983-1-2-20210518
No ratings yet
17489-Article Text-20983-1-2-20210518
10 pages
Mcdonald 06
No ratings yet
Mcdonald 06
8 pages
Assessment and Identification of Needs PDF
100% (1)
Assessment and Identification of Needs PDF
637 pages
AMR Parsing As Sequence-to-Graph Transduction
No ratings yet
AMR Parsing As Sequence-to-Graph Transduction
15 pages
2010SocherManningNg Learning Continuous Phrase Representations and Syntactic Parsing With Recursive Neural Networks
No ratings yet
2010SocherManningNg Learning Continuous Phrase Representations and Syntactic Parsing With Recursive Neural Networks
9 pages
Deep Biaffine Attention For Neural Dependency Parsing
No ratings yet
Deep Biaffine Attention For Neural Dependency Parsing
8 pages
Acl 2020
No ratings yet
Acl 2020
6 pages
cs224n 2019 Notes04 Dependencyparsing
No ratings yet
cs224n 2019 Notes04 Dependencyparsing
5 pages
Dependency Parsing
100% (11)
Dependency Parsing
127 pages
2011 Parsing Aistats
No ratings yet
2011 Parsing Aistats
11 pages
Graph Convolutional Networks For Named Entity Recognition: Gcns NER
No ratings yet
Graph Convolutional Networks For Named Entity Recognition: Gcns NER
9 pages
2020 Acl-Main 577
No ratings yet
2020 Acl-Main 577
7 pages
Arxiv: Natural Language Processing (Almost) From Scratch
No ratings yet
Arxiv: Natural Language Processing (Almost) From Scratch
47 pages
Dependency Parsing
No ratings yet
Dependency Parsing
96 pages
Trend
No ratings yet
Trend
47 pages
Imitation Learning: Modeling & Learning Sequence of Decisions
No ratings yet
Imitation Learning: Modeling & Learning Sequence of Decisions
53 pages
NLP Sem 3 Unit
No ratings yet
NLP Sem 3 Unit
12 pages
NLP Assignment-6 Solution
No ratings yet
NLP Assignment-6 Solution
5 pages
Neural Net
No ratings yet
Neural Net
62 pages
Deep Learning On Code With An Unbounded Vocabulary
No ratings yet
Deep Learning On Code With An Unbounded Vocabulary
11 pages
On The Vietnamese Name Entity Recognition: A Deep Learning Method Approach
No ratings yet
On The Vietnamese Name Entity Recognition: A Deep Learning Method Approach
5 pages
CS224n: Natural Language Processing With Deep Learning: Lecture Notes: Part IV Dependency Parsing Winter 2019
No ratings yet
CS224n: Natural Language Processing With Deep Learning: Lecture Notes: Part IV Dependency Parsing Winter 2019
5 pages
Pathfit 1. Lesson 1-3
No ratings yet
Pathfit 1. Lesson 1-3
7 pages
Natural Language Processing From Scratch
No ratings yet
Natural Language Processing From Scratch
45 pages
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
No ratings yet
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
10 pages
Unit 3 NLP New
No ratings yet
Unit 3 NLP New
15 pages
Overview of The Transformer-Based Models For NLP Tasks
No ratings yet
Overview of The Transformer-Based Models For NLP Tasks
5 pages
Globally Normalized Transition-Based Neural Networks
No ratings yet
Globally Normalized Transition-Based Neural Networks
12 pages
A Fast and Accurate Dependency Parser Using Neural Networks
No ratings yet
A Fast and Accurate Dependency Parser Using Neural Networks
11 pages
CS 224n Assignment #3: Dependency Parsing: 1. Machine Learning & Neural Networks (8 Points)
No ratings yet
CS 224n Assignment #3: Dependency Parsing: 1. Machine Learning & Neural Networks (8 Points)
7 pages
Dependency Parsing Using Neural Network Classifier
No ratings yet
Dependency Parsing Using Neural Network Classifier
4 pages
Unit 4 Syntax-Directed Translation & Intermediate Code Generation
No ratings yet
Unit 4 Syntax-Directed Translation & Intermediate Code Generation
47 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
Dependency Parsing: Pawan Goyal
No ratings yet
Dependency Parsing: Pawan Goyal
38 pages
A Unified Architecture For Natural Language Processing
No ratings yet
A Unified Architecture For Natural Language Processing
15 pages
Storytelling PDF
No ratings yet
Storytelling PDF
38 pages
IsiXhosa HL P1 May-June 2023
No ratings yet
IsiXhosa HL P1 May-June 2023
13 pages
Guc 2 61 38781 2023-11-25T16 29 04
No ratings yet
Guc 2 61 38781 2023-11-25T16 29 04
3 pages
Python Programming Important Notes
No ratings yet
Python Programming Important Notes
46 pages
Seeking: Company Name (A-Z) Internships Co-Ops Full-Time, Post-Graduation
No ratings yet
Seeking: Company Name (A-Z) Internships Co-Ops Full-Time, Post-Graduation
5 pages
Projectoverviewextinctzoo
No ratings yet
Projectoverviewextinctzoo
2 pages
College Grades 1st Sem (S.Y 2019-2020)
No ratings yet
College Grades 1st Sem (S.Y 2019-2020)
51 pages
Wheel Thrown Rubric
No ratings yet
Wheel Thrown Rubric
3 pages
Turtle - Turtle Graphics - Python 3.12.3 Documentation
No ratings yet
Turtle - Turtle Graphics - Python 3.12.3 Documentation
34 pages
JMH 300 Assignment 1
No ratings yet
JMH 300 Assignment 1
28 pages
LS1 - G6 - PE - Lesson 2
No ratings yet
LS1 - G6 - PE - Lesson 2
6 pages
Simultaneous Equations Quadratic
No ratings yet
Simultaneous Equations Quadratic
7 pages
Week 4 Aess
No ratings yet
Week 4 Aess
6 pages
Classroom Management
No ratings yet
Classroom Management
24 pages
17-M ScBotany
No ratings yet
17-M ScBotany
16 pages
2 Machine Learning-Enabled Data-Driven Research On Paper-Reinforced Composite Materials - SpringerLink
No ratings yet
2 Machine Learning-Enabled Data-Driven Research On Paper-Reinforced Composite Materials - SpringerLink
9 pages
Naukri VishwanathReddy (2y 4m)
No ratings yet
Naukri VishwanathReddy (2y 4m)
1 page
RTS qp3
No ratings yet
RTS qp3
2 pages
Netball Skills Development Framework: Quality Coaching and Accreditation
No ratings yet
Netball Skills Development Framework: Quality Coaching and Accreditation
3 pages
Thermal Degradation of PE and PTFE During Vacuum Evaporation
No ratings yet
Thermal Degradation of PE and PTFE During Vacuum Evaporation
4 pages
ICT Infrastructure Roadmap: Alain Del Bustamante Pascua Undersecretary For Administration Department of Education
No ratings yet
ICT Infrastructure Roadmap: Alain Del Bustamante Pascua Undersecretary For Administration Department of Education
4 pages
Nosework
No ratings yet
Nosework
4 pages
Unannounce Earthquake Drill
No ratings yet
Unannounce Earthquake Drill
15 pages
Physics Project Plan
No ratings yet
Physics Project Plan
2 pages
202440272655FF Ubah
No ratings yet
202440272655FF Ubah
1 page
What Are The Characteristics of An Educated Person
No ratings yet
What Are The Characteristics of An Educated Person
2 pages
CH E 350, Process Heat Transfer Fall 2010: Course Content
No ratings yet
CH E 350, Process Heat Transfer Fall 2010: Course Content
2 pages
Long Form of Rubric For Executive Briefings
No ratings yet
Long Form of Rubric For Executive Briefings
1 page
TensorFlow构建机器学习项目: Chinese Edition
From Everand
TensorFlow构建机器学习项目: Chinese Edition
Posts & Telecom Press
No ratings yet
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
Manish Soni
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet

NLP 2024 406

Uploaded by

NLP 2024 406

Uploaded by

Natural Language Processing

Neural architectures for dependency parsing

This work is licensed under a

• Learning a greedy transition-based dependency parser

• Learning an arc-factored graph-based dependency parser

• Pre-neural transition classi ers relied on linear

• C & M propose to replace the linear model with ReLU

• e standard choice for the transfer function is

wanted to try someplace new

softmax scores for the transitions

Embed Embed Embed

wanted try someplace new

softmax scores for the transitions

Embed Embed Embed

wanted try someplace

• In addition to word embeddings, they also use embeddings for

• To train their parser, C & M minimise cross-entropy loss relative

• To generate training examples for the transition classi er, they

Chen and Manning ( ) . .

• e neural parser of C & M learns useful feature combinations,

• K & G propose to use a minimal set of core features based on

• ey show that this approach gives state-of-the-art accuracy both

wanted to try someplace new

FNN scores for the transitions

Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM

Embed Embed Embed Embed Embed Embed

I wanted to try someplace new

wanted try someplace new

FNN scores for the transitions

Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM

Embed Embed Embed Embed Embed Embed

I wanted to try someplace new

• For their transition-based parser, K & G embed the top words

• In contrast to C & M, they use a dynamic oracle, so they cannot

FNN score for the arc

Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM

Embed Embed Embed Embed Embed Embed

I wanted to try someplace new

FNN score for the arc

Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM

Embed Embed Embed Embed Embed Embed

I wanted to try someplace new

• For their graph-based parser, K & G embed the head and

• e training objective is to maximise the margin between the

Chen and Manning ( ) . .

• G & V adopt the basic architecture of K & G but use a BERT

• e arc scores are computed using a bi-a ne layer:

Bia ne score for the arc liked → I

BERT FFN FFN FFN FFN FFN FFN

MHA MHA MHA MHA MHA MHA

[CLS] I lik ##ed the place

You might also like