0% found this document useful (0 votes)
6 views31 pages

Lec4-5 - Modern Syntactic Analysis

Uploaded by

a36995950
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views31 pages

Lec4-5 - Modern Syntactic Analysis

Uploaded by

a36995950
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

‫السنة الخامسة‬ ‫مقرر معالجة اللغات الطبيعيّة‬

‫التحليل النحوي ‪2‬‬


‫‪Dependency Trees, Chunking , NER‬‬

‫د‪ .‬رياض سنبل‬

‫‪RIAD SONBOL - NLP COURSE‬‬ ‫‪1‬‬


Head Words
▪ Syntactic phrases usually have a word
in them that is most “central” to the
phrase. Sliked-VBD
▪ Simple rules can identify the head of
NPJohn-NNP VP liked-VBD
any phrase:
◦ Head of a VP is the main verb NNP VBD NPdog-NN

◦ Head of an NP is the main noun John liked DT Nominal dog-NN

◦ Head of a PP is the preposition the Nominal


dog-NN
PPin-IN
◦ Head of a sentence is the head of its VP NN IN NPpen-NN
dog in DT Nominal pen-NN
▪ Specialized productions can be
generated by including the head word the NN
and its POS of each non-terminal as pen
part of that non-terminal’s symbol.
RIAD SONBOL - NLP COURSE 2
Head Words (Lexicalized Productions)
▪ It can improve the overall parsing accuracy (See the effect of the verb in the next
example).
Sput-VBD Sliked-VBD

NPJohn-NNP NPJohn-NNP VP liked-VBD


VP put-VBD
VPput-VBD PP in-IN NNP VBD NPdog-NN
NNP
John liked DT Nominal dog-NN
John VBD NPdog-NN IN NPpen-NN
put in DT Nominal pen-NN the Nominal
dog-NN
PPin-IN
DT Nominal
dog-NN NN IN NPpen-NN
the the NN
NN dog in DT Nominal pen-NN
pen
dog
the NN
pen
VPput-VBD → VPput-VBD PPin-IN Nominaldog-NN → Nominaldog-NN PPin-IN

RIAD SONBOL - NLP COURSE 3


Dependency Grammars
▪ An alternative to phrase-structure grammar is to define a parse as a directed
graph between the words => dependencies between the words.
▪ Can convert a phrase structure parse to a dependency tree by making the head
of each non-head child of a node depend on the head of the head child.
Sliked-VBD

NPJohn-NNP liked
VP liked-VBD
NNP VBD NPdog-NN John dog
John liked DT Nominal dog-NN
the in
the Nominal
dog-NN
PPin-IN
NN IN NPpen-NN pen
dog in DT Nominal pen-NN the
the NN
pen RIAD SONBOL - NLP COURSE 4
Example

equivalent, but shows word order


(head -> modifier)
RIAD SONBOL - NLP COURSE 5
Dependency View of Syntax
▪ No explicit phrase structure
▪ Key feature: how do words relate to each
other?
▪ Advantages:
◦ More helpful to free word order
◦ Closer alignment between analyses of different
languages

▪ Disadvantages:
◦ Some loss of expressivity
◦ Another formalism: what are the annotation
standards, how to build corpora, etc.?

RIAD SONBOL - NLP COURSE 6


Edge Labels
▪ It can be useful to distinguish different kinds of head-
modifier relations, by labeling edges
▪ Important relations for English:
◦ Subject, direct object, determiner, adjective modifier,
……
▪ Different treebanks use different sets
◦ Dependency Treebanks
◦ Prague Dependency Treebank: 1.5 million words liked Typed
nsubj dobj dependency
of direct annotation (in Czech)
John dog parse
◦ Universal Dependencies (universaldependencies.org): det in
◦ many languages including Arabic. the pen
det
the
RIAD SONBOL - NLP COURSE 7
Applications
▪ For information extraction tasks involving real-world relationships between
entities, chains of dependencies provide good features

RIAD SONBOL - NLP COURSE 8


Projectivity
▪ A sentence's dependency parse is said to be projective if every subtree (node and all its
descendants) occupies a contiguous span of the sentence
▪ This also means the dependency parse can be drawn on top of the sentence without any crossing
edges

RIAD SONBOL - NLP COURSE 9


Nonprojectivity
▪ Other sentences are nonprojective

▪ Nonprojectivity is rare in English, but quite common in many languages


▪ Note: in the above example, reordering the clauses would result in projectivity without
changing the meaning

RIAD SONBOL - NLP COURSE 10


Chunking

RIAD SONBOL - NLP COURSE 11


What is Chunking?
▪ Form of partial (shallow) parsing
◦ Extracts major syntactic units, but not full parse trees
▪ Task: identify and classify
◦ Flat, non-overlapping segments of a sentence
◦ Basic non-recursive phrases
◦ Correspond to major POS
◦ May ignore some categories; i.e. base NP chunking
◦ Create simple bracketing
◦ [NPThe morning flight][PPfrom][NPDenver][Vphas arrived]
◦ [NPThe morning flight] from [NPDenver] has arrived
Example
S

VP

VP

PP

NP NP

NNP NNP VBZ VBN IN DT NN NN NN NN

Breaking Dawn has broken into the box office top ten
Example
S

VP

VP

PP

NP NP

NNP NNP VBZ VBN IN DT NN NN NN NN

Breaking Dawn has broken into the box office top ten

NP VP PP NP
Why Chunking?
▪ Used when full parse unnecessary
◦ Or infeasible or impossible (when?)
▪ Extraction of subcategorization frames
◦ Identify verb arguments
◦ e.g. VP NP
◦ VP NP NP
◦ VP NP to NP
▪ Information extraction: who did what to whom
▪ Summarization: Base information, remove mods
▪ Information retrieval: Restrict indexing to base NPs
Approaches
▪ Finite-state Approaches
◦ Grammatical rules in FSTs
◦ Cascade to produce more complex structure

▪ Machine Learning
◦ Similar to POS tagging
Finite-State Rule-Based Chunking
▪ Hand-crafted rules model phrases
◦ Typically application-specific
▪ Left-to-right longest match (Abney 1996)
◦ Start at beginning of sentence
◦ Find longest matching rule
◦ Greedy approach, not guaranteed optimal
Finite-State Rule-Based Chunking
▪ Chunk rules:
◦ Cannot contain recursion
◦ NP -> Det Nominal: Okay
◦ Nominal -> Nominal PP: Not okay
▪ Examples:
◦ NP → (Det) Noun* Noun
◦ NP → Proper-Noun
◦ VP → Verb
◦ VP → Aux Verb
Cascading FSTs
▪ Richer partial parsing
◦ Pass output of FST to next FST
▪ Approach:
◦ First stage: Base phrase chunking
◦ Next stage: Larger constituents (e.g. PPs, VPs)
◦ Highest stage: Sentences
Example
Chunking by Classification
▪ Model chunking as task similar to POS tagging
▪ Instance: tokens
▪ Labels:
◦ Simultaneously encode segmentation & identification
◦ IOB (or BIO tagging)
◦ Segment: B(eginning), I (nternal), O(utside)
◦ Identity: Phrase category: NP, VP, PP, etc.
◦ The morning flight from Denver has arrived
◦ NP-B NP-I NP-I PP-B NP-B VP-B VP-I
Features for Chunking
▪ What are good features?
◦ Preceding tags
◦ for 2 preceding words
◦ Words
◦ for 2 preceding, current, 2 following
◦ Parts of speech
◦ for 2 preceding, current, 2 following
▪ Vector includes those features + true label
Chunking as Classification
▪ Example
State-of-the-Art
▪ Base NP chunking: 0.96
▪ Complex phrases:
◦ Learning: 0.92-0.94
◦ Rule-based: 0.85-0.92
▪ Limiting factors:
◦ POS tagging accuracy
◦ Inconsistent labeling (parse tree extraction)
◦ Conjunctions
◦ Late departures and arrivals are common in winter
◦ Late departures and cancellations are common in winter
Named Entity Recognition

RIAD SONBOL - NLP COURSE 25


Sequence Labeling
▪ Goal: Find most probable labeling of a sequence
▪ Many sequence labeling tasks
◦ POS tagging
◦ Word segmentation
◦ Named entity tagging
◦ Story/spoken sentence segmentation
◦ Pitch accent detection
◦ Dialog act tagging
NER as Classification Task
▪ Instance: token
▪ Labels:
◦ Position: B(eginning), I(nside), Outside
◦ NER types: PER, ORG, LOC, NUM
◦ Label: Type-Position, e.g. PER-B, PER-I, O, …
◦ How many tags?
◦ (|NER Types|x 2) + 1
O Tag
One for B and one for I
NER as Classification: Features
▪ What information can we use for NER?

◦ Predictive tokens: e.g. MD, Rev, Inc,..


▪ How general are these features?
◦ Language? Gender? Domain?
NER as Classification: Shape Features
▪ Shape types:
◦ lower: e.g. e. e. cummings
◦ All lower case
◦ capitalized: e.g. Washington
◦ First letter uppercase
◦ all caps: e.g. WHO
◦ all letters capitalized
◦ mixed case: eBay
◦ Mixed upper and lower case
◦ Capitalized with period: H.
◦ Ends with digit: A9
◦ Contains hyphen: H-P
Example Instance Representation
▪ Example
Sequence Labeling
▪ Example

You might also like